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5 

Human Galactokinase Gene 

This invention was made in part with government support under EY-09404 
10 awarded by the National Institutes of Health. The U.S. Government has certain 
rights in the invention. 

Cross-Reference to Related Applications: 

This application is a continuation in part of Serial No. PCT/US 94/ 10825, 
15 filed 23 September 1994. 

Field of the Invention: 

This invention relates to human galactokinase and the identification of 
galactokinase mutations, a missense and nonsense, as well as isolated nucleic acids 
20 encoding same, recombinant host cell transformed with DNA encoding such 
proteins and to uses of the expressed proteins and nucleic acid sequences in 
therapeutic and diagnostic applications. 

Background of the Invention: 

25 There are numerous inherited human metabolic disorders, most of which are 

recessive. Many have devastating effects that may include a combination of several 
clinical features, such as severe mental retardation, impairment of the peripheral 
nervous system, blindness, hearing deficiency and organomegaly. Most of the 
disorders are rare. However, the majority of such disorders cannot be treated by 

30 drugs. 

Galactokinase deficiency is one of three known forms of galactosemia. The 
other forms are galactose- 1 -phosphate uridyltransferase deficiency and UDP- 
galactose-4-epimerase deficiency. All three enzymes are involved in galactose 
metabolism, i.e., the conversion of galactose to glucose in the body. Galactokinase 

35 deficiency is inherited as an autosomal recessive trait with a heterozygote frequency 
estimated to be 0.2% in the general population (see, e.g.. Levy et al., J. Pediatr. . 
22:871-877 (1978)). Patients with homozygous galactokinase deficiency usually 
become symptomatic in the early infantile period showing galactosemia, 
galactosura, increased galactitol levels, cataracts and in a few cases, mental 

40 retardation (Segal et al.. J, Pgdiarr,, 25:750-752 (1979)). These symptoms usually 
improve dramatically with the administration of a galactose free diet. 
Heterozygotes for galactokinase deficiency are prone to presenile cataracts with the 
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5 onset during 20-50 years of age (Stambolian et al., Invest. Ophthal. Vis. Sci. . 
22:429-433 (1986)). 

Galactokinase activity has been found in a variety of mammalian tissues, 
including liver, kidney, brain, lens, placenta, erythrocytes and leukocytes. While 
the protein has been purified from E. coli, the purification of the protein from 

10 mammalian tissues has proven difficult due to its low cellular concentration. In 
addition, the molecular basis of galactokinase deficiency is unknown. 

This invention provides a human galactokinase gene. The DNAs of this 
invention, such as the specific sequences disclosed herein, are useful in that they 
encode the genetic information required for expression of this protein. Additionally, 

1 5 the sequences may be used as probes in order to isolate and identify additional 

members, of the family, type and/or subtype as well mutations which may form the 
basis of galactokinase deficiency which may be characterized by site-specific 
mutations or by atypical expression of the galactokinase gene. The galactokinase 
gene is also useful as a diagnostic agent to identify mutant galactokinase proteins or 

20 as a therapeutic agent via gene therapy. 

The first clinical trials of gene therapy began in 1990. Since that time, 
more than 70 clinical trial protocols have been reviewed and approved by a 
regulatory authority such as the NTH's Recombinant Advisory Committee (RAC), 
see, e.g., Anderson, W. F. f Human Gene Therapy . i:28 1-282 (1994). The 

25 therapeutic treatment of diseases and disorders by gene therapy involves the transfer 
and stable insertion of new genetic information into cells. The correction of a 
genetic defect by re-introduction of the normal allele of a gene has hence 
demonstrated that this concept is clinically feasible (see, e.g., Rosenberg et al., New 
Eng. J. Med. . 222: 570 (1990)). 

30 These and additional uses for the reagents described herein will become 

apparent to those of ordinary skill in the an upon reading this specification. 

Summary of the Invention: 

This invention provides isolated nucleic acid molecules encoding human 
35 galactokinase, as well as nucleic acid molecules encoding missense and nonsense 
mutations, which includes mRNAs, DNAs (e.g., cDNA, genomic DNA, etc.), as 
well as antisense analogs thereof and diagnostically or therapeutically useful 
fragments thereof. 

This invention also provides recombinant vectors, such as cloning and 
40 expression plasmids useful as reagents in the recombinant production of human 
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galactokinase proteins, as well as recombinant prokaryotic and/or eukaryotic host 
cells comprising a human galactokinase nucleic acid sequence. 

This invention also provides a process for preparing human galactokinase 
proteins which comprises culturing recombinant prokaryotic and/or eukaryotic host 
cells, containing a human galactokinase nucleic acid sequence, under conditions 
promoting expression of said protein and subsequent recovery thereof of said 
protein. Another related aspect of this invention is isolated human galactokinase 
proteins produced by said method. In yet another aspect, this invention also 
provides antibodies that are directed to (i.e., bind) human galactokinase proteins. 
This invention also provides an isolated human galactokinase proteins 
15 having a missense or nonsense mutation and antibodies (monoclonal or polyclonal) 
that are specifically reactive with said proteins. 

This invention also provides nucleic acid probes and PCR primers 
comprising nucleic acid molecules of sufficient length to specifically hybridize to 
human galactokinase sequences. 
20 This invention also provides a method to diagnose human galactokinase 

deficiency which comprises isolating a nucleic acid sample from an individual and 
assaying the sequence of said nucleic acid sample with the reference gene of the 
invention and comparing differences between said sample and the nucleic acid of 
the instant invention, wherein said differences indicate mutations in the human 
25 galactokinase gene isolated from an individual. The sample can be assayed by 
direct sequence comparison (i.e., DNA sequencing), wherein the sample nucleic 
acid can be compared to the reference galactokinase gene, by hybridization (e.g., 
mobility shift assays such as heteroduplex gel electrophoresis, SSCP or other 
techniques such as Northern or Southern blotting which are based upon the length of 
30 the nucleic acid sequence) or other known gel electrophoresis methods such as 

RLFP (for example, by restriction endonuclease digestion of a sample amplified by 
PCR (for DNA) or PCR-RT (for RNA)). Alternatively, the diagnostic method 
comprises isolating cells from an individual containing genomic DNA and assaying 
said sample (e.g., cellular RNA) by in situ hybridization using the DNA sequence of 
35 the invention, or at least one exon, or a fragment containing at least 15, preferably 
18, and more preferably 21 contiguous base pairs as a probe. This invention also 
provides an antisense oligonucleotide having a sequence capable of binding with 
mRNAs encoding human galactokinase so as to identify mutant galactokinase genes. 
This invention also provides yet another method to diagnose human 
40 galactokinase deficiency which comprises obtaining a serum or tissue sample; 
allowing such sample to come in contact with an antibody or antibody fragment 

3 
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5 which specifically binds to a mutant human galactokinase protein of the invention 
under conditions such that an antigen- antibody complex is formed between said 
antibody (or antibody fragment) and said mutant galactokinase protein; and 
detecting the presence or absence of said complex. 

This invention also provides transgenic non- human animals comprising a 
10 nucleic acid molecule encoding human galactokinase. Also provided are methods 
for use of said transgenic animals as models for disease states, mutation and S AR. 

This invention also provides a method for treating conditions which are 
related to insufficient human galactokinase activity which comprises administering to 
a patient in need thereof a pharmaceutical composition containing the galactokinase 
15 protein of the invention which is effective to supplement a patient's endogenous 
galactokinase and thereby alleviating said condition. 

This invention also provides a method for treating conditions which are 
related to insufficient human galactokinase activity via gene therapy. An additional, 
or reference, gene comprising the non-mutant galactokinase gene of the instant 
20 invention is inserted into a patient's cells either in vivo or ex vivo. The reference 
gene is expressed in transfected cells and as a result, the protein encoded by the 
reference gene corrects the defect (i.e., galactokinase deficiency) thus permitting the 
transfected cells to function normally and alleviating disease conditions (or 
symptoms). 

25 

Brief Description of the Drawings: 

Figure 1 depicts the intron/exon organization of the human galactokinase 

gene. 

Figure 2 is the genomic DNA sequence (and single letter amino acid 
30 abbreviations) for human galactokinase [SEQ ID NO: 7]. The bolded DNA 

sequence corresponds to the exon regions whereas the normal or unbolded type 
corresponds to the intron regions of human galactokinase. 



Detailed Description of the Invention: 

35 This invention relates to human galactokinase (amino acid and nucleotide 

sequences) and its use as a diagnostic and therapeutic. The particular cDNA and 
amino acid sequence of human galactokinase is identified by SEQ ID NO:4 as 
described more fully below. This invention also relates to the genomic DNA 
sequence for human galactokinase [SEQ ID NO: 7] and also to mutant human 

40 galactokinase genes and amino acid sequences [SEQ ID NO:5 and 6] and their use 
for diagnostic purposes. 

4 
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5 In further describing the present invention, the following additional terms 

will be employed, and are intended to be defined as indicated below. 

An "antigen" refers to a molecule containing one or more epitopes 
that will stimulate a host's immune system to make a humoral and/or cellular 
antigen-specific response. The term is also used herein interchangeably with 
10 "immunogen." 

The term "epitope" refers to the site on an antigen or hapten to which 
a specific antibody molecule binds. The term is also used herein interchangeably 
with "antigenic determinant" or "antigenic determinant site." 

A coding sequence is "operably linked to" another coding sequence 
15 when RNA polymerase will transcribe the two coding sequences into a single 
mRNA, which is then translated into a single polypeptide having amino acids 
derived from both coding sequences. The coding sequences need not be contiguous 
to one another so long as the expressed sequence is ultimately processed to produce 
the desired protein. 

20 "Recombinant" polypeptides refer to polypeptides produced by 

recombinant DNA techniques; i.e., produced from cells transformed by an 
exogenous DNA construct encoding the desired polypeptide. "Synthetic" 
polypeptides are those prepared by chemical synthesis. 

A "replicon" is any genetic element (e.g., plasmid, chromosome, 

25 virus) that functions as an autonomous unit of DNA replication in vivo : i.e., capable 
of replication under its own control. 

A "vector" is a replicon, such as a plasmid, phage, or cosmid, to 
which another DNA segment may be attached so as to bring about the replication of 
the attached segment. 

30 A "replication-deficient virus" is a virus in which the excision and/or 

replication functions have been altered such that after transfection into a host cell, 
the virus is not able to reproduce and/or infect addition cells. 

A "reference" gene refers to the galactokinase sequence of the 
invention and is understood to include the various sequence polymorphisms that 
35 exist, wherein nucleotide substitutions in the gene sequence exist, but do not affect 
the essential function of the gene product. 

A "mutant" gene refers to galactokinase sequences different from the 
reference gene wherein nucleotide substitutions and/or deletions and/or insertions 
result in impairment of the essential function of the gene product such that the levels 
40 of galactose in an individual (or patient) are atypically elevated. For example, the G 
to A substitution at position 122 of human galactokinase [SEQ ID NO: 5] is a 

5 
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5 missense mutation associated with patients who are galactokinase deficient. Another 
T for G substitution produces an in-frame nonsense codon at amino acid position 80 
of the mature protein. The result is a truncated protein consisting of the first 79 
amino acids of human galactokinase. 

A DNA "coding sequence of or a "nucleotide sequence encoding" a 
10 particular protein, is a DNA sequence which is transcribed and translated into a 
polypeptide when placed under the control of appropriate regulatory sequences. 

A "promoter sequence" is a DNA regulatory region capable of 
binding RNA polymerase in a cell and initiating transcription of a downstream (3' 
direction) coding sequence. For purposes of defining the present invention, the 
15 promoter sequence is bound at the 3' terminus by a translation start codon (e.g., 
ATG) of a coding sequence and extends upstream (5' direction) to include the 
minimum number of bases or elements necessary to initiate transcription at levels 
detectable above background. Within the promoter sequence will be found a 
transcription initiation site (conveniently defined by mapping with nuclease SI), as 
20 well as protein binding domains (consensus sequences) responsible for the binding 
of RNA polymerase. Eukaryotic promoters will often, but not always, contain 
"TATA" boxes and "CAT" boxes. Prokaryotic promoters contain Shine-Dalgamo 
sequences in addition to the -10 and -35 consensus sequences. 

DNA "control sequences" refers collectively to promoter sequences, 
25 ribosome binding sites, polyadenylation signals, transcription termination sequences, 
upstream regulatory domains, enhancers, and the like, which collectively provide for 
the expression (i.e., the transcription and translation) of a coding sequence in a host 
cell. 

A control sequence "directs the expression" of a coding sequence in a 
30 cell when RNA polymerase will bind the promoter sequence and transcribe the 

coding sequence into mRNA, which is then translated into the polypeptide encoded 
by the coding sequence. 

A "host cell" is a cell which has been transformed or rransfected, or is 
capable of transformation or transfection by an exogenous DNA sequence. 
35 A cell has been "transformed" by exogenous DNA when such 

exogenous DNA has been introduced inside the cell membrane. Exogenous DNA 
may or may not be integrated (covalently linked) into chromosomal DNA making up 
the genome of the cell. In prokaryotes and yeasts, for example, the exogenous DNA 
may be maintained on an episomal element, such as a plasmid. With respect to 
40 eukaryotic cells, a stably transformed or rransfected cell is one in which the 

exogenous DNA has become integrated into the chromosome so that it is inherited 
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5 by daughter cells through chromosome replication. This stability is demonstrated by 
the ability of the eukaryotic cell to establish cell lines or clones comprised of a 
population of daughter cell containing the exogenous DNA. 

"Transfection" or "transfected" refers to a process by which ceUs take 
up foreign DNA and integrate that foreign DNA into their chromosome 
10 Transfection can be accomplished, for example, by various techniques in which 
cells take up DNA (e.g., calcium phosphate precipitation, electroporation 
assimilation of liposomes, etc.), or by infection, in which viruses are used to transfer 
DNA into cells. 

A "target cell" is a cell(s) that is selectively transfected over other 
15 cell types (or cell lines). 

A "clone" is a population of cells derived from a single cell or 
common ancestor by mitosis. A "cell line" is a clone of a primary cell that is 
capable of stable growth in yjuo. for many generations. 

A "heterologous" region of a DNA construct is an identifiable 
segment of DNA within or attached to another DNA molecule that is not found in 
association with the other molecule in nature. Thus, when the heterologous region 
encodes a gene, the gene will usually be flanked by DNA that does not flank the 
gene in the genome of the source animal. Another example of a heterologous coding 
sequence ,s a construct where the coding sequence itself is not found in nature (e g 
synthettc sequences having codons different from the native gene). Allelic variation 
or naturally occurring mutational events do not give rise to a heterologous region of 
DNA, as used herein. 

"Conditions which are related to insufficient human gaJactokinase activity" 
or a "deficiency in galactokinase activity" means mutations of the gaJactokinase 
protein which affects galactokinase activity or may affect expression of 
galactokinase or both such that the levels of galactose in a patient are atypically 
elevated. In addition, this definition is intended to cover atypically low levels of 
galactokinase expression in a patient due to defective control sequences for the 
reference galactokinase protein. 

This invention provides an isolated nucleic acid molecule encoding a human 
galactokinase protein and substantially similar sequences. Isolated nucleic acid 
sequences are "substantially similar" if: (i) they are approximately the same length 
(i.e., at least 80% of the coding region of SEQ ID NO:4); (ii) th ey encode a protein 
with the same (i.e., within an order of magnitude) galactokinase activity as the 
protein encoded by SEQ ID NO:4; and (iii) they are capable of hybridizing under 
moderately stringent conditions to SEQ ID NO:4; or they encode DNA sequences 
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5 which arc degenerate to SEQ ID NO:4. Degenerate DNA sequences encode the 
same amino acid sequence as SEQ ID NO:4, but have variation(s) in the nucleotide 
coding sequences. Hybridization under moderately stringent conditions is outlined 
below. 

Hybridization under moderately stringent conditions can be performed as 

10 follows. Nitrocellulose filters arc prehybridized at 65°C in a solution containing 6X 
SSPE, 5X Denhardt's solution (lOg Ficoll, lOg BSA and lOg Polyvinylpyrrolidone 
per liter solution), 0.05% SDS and 100 micrograms tRNA. Hybridization probes are 
labeled, preferably radiolabeled (e.g., using the Bios TAG-IT® kit). Hybridization is 
then carried out for approximately 18 hours at 65°C. The filters are then washed in a 

15 solution of 2X SSC and 0.5% SDS at room temperature for 15 minutes (repeated 
once). Subsequently, the filters are washed at 58°C, air-dried and exposed to X-ray 
film overnight at -70°C with an intensifying screen. 

Alternatively, "substantially similar" sequences are substantially the same 
when about 66% (preferably about 75%, and most preferably about 90%) of the 

20 nucleotides or amino acids match over a defined length (i.e., at least 80% of the 
coding region of SEQ ID NO:4) of the molecule and the protein encoded by such 
sequence has the same (i.e., within an order of magnitude) galactokinase activity as 
the protein encoded by SEQ ID NO:4. As used herein, substantially similar refers to 
the sequences having similar identity to the sequences of the instant invention. Thus 

25 nucleotide sequences that are substantially the same can be identified by 

hybridization or by sequence comparison. Protein sequences that are substantially 
the same can be identified by one or more of the following: proteolytic digestion, gel 
electrophoresis and/or microsequencing. 

This invention also provides isolated nucleic acid molecules encoding a 

30 missense mutation (SEQ ID NO:5) or a nonsense mutation (SEQ ED NO:6) of the 
human galactokinase protein and DNA sequences which are degenerate to SEQ ID 
NO:5 or 6. Degenerate DNA sequences encode the same amino acid (or termination 
site) sequence as SEQ ID NO:5 or 6, but have variation(s) in the nucleotide coding 
sequences. 

35 One means for isolating a nucleic acid molecule encoding for a human 

galactokinase is to probe a human genomic or cDNA library with a natural or 
artificially designed probe using art recognized procedures (See for example: 
"Current Protocols in Molecular Biology", Ausubel, F.M., et al. (eds.) Greene 
Publishing Assoc. and John Wiley Interscience, New York, 1989,1992). It is 

40 appreciated to one skilled in the an that SEQ ID NO:4, or fragments thereof 
(comprising at least 15 contiguous nucleotides), is a particularly useful probe. 
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5 Several particularly useful probes for this purpose are set fonh in Table 1 or 

hybnd^ble fragments thereof (i.e.. comprising at least 15 contiguous nucleotides) 
It , also apprectated that such probes can be and are preferably .abeled with an 
analyocally detectable reagent to facilitate identification of the probe Useful 
reagents include but are not limited to radioactivity, fluorescent dyes or enzymes 
capable of catalyzing the formation of a detectable product. The probes are thus 
useful to .solate complementary copies of genomic DNA, cDNA or RNA from 
human, mammalian or other animal sources or to screen such sources for related 
sequences (e.g.. additional members of the family, type and/or subtype) and 
■ncludtng transcriptional regulatory and control elements defined above as well as 

fr! m r r^ ty 'r. r0CCSSinS ' tranSlad ° n tiSSUC ^-^eterrmning regions 
from 5 and/or 3 regions relative to the coding sequences disclosed herein 

This invention also provides for gene therapy. "Gene therapy" means gene 
supp.ementa.on. Tnat is, an additional (i.e.. reference) copy of the gene of in C rT 
s inserted into a patients' cells. As a resu,t. the protein encoded by the reference 
20 gene corrects the defect (i.e.. galactokinase deficiency) and permits the cells to 
function normally thus alleviating disease symptoms. 

Gene therapy of the present invention can occur in vivo or ex vivo Ex vivo 
gene therapy requires the isolation and purification of patient cells, the introduction 
of a therapeunc gene, and introduction of the genetically altered cells back into the 
patient. A replication-deficient virus such as a modified retrovirus can be used to 

^uce the therapeutic gene (galactokinase) into such cells. For examp,e. mouse 
Mo o ney leuk virus (MMLV) js a we]Nknown vecfor ^ djnjcaj gene 

tnals (see. e.g.. Boris-Lauerie et al.. Curr Onin r.^ ry.. 2;10 2-109 (1993)) 

In contrast.!, v/vo gene therapy does not require .solation and purification 
of patients cells. The therapeutic gene is typically "packaged" for administration to 
a patient such as in liposomes or in a replication-deficient virus such as adenovirus 
(see, e.g.. Berkner, K.L., Curr. Ton MW„ hH , rrnn „ m . 39 . 66 ( , 992)) Qf 
adeno-associated virus (AAV) vectors (see, e.g.. Muzyczka. N., Curr. Top 
M.crob.ol , Immunol . 153:97-129 (1992) and U.S. Patent 5.252,479 "Safe Vector 
for Gene Therapy"). Another approach is administration of so-called "naked DNA" 
.n which the therapeutic gene is directly injected into the bloodstream or muscle 
tissue. 

Cell types useful for gene therapy of the present invention include 
hepatocytes, fibroblasts, lymphocytes, any cell of the eye (e.g.. retina), epithelial 
and endothelial cells. Preferably the cells are hepatocytes. any cell of the eye or 
respiratory (or pulmonary) epithelial cells. Transfecuon of (pulmonary) epithelial 
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5 cells can occur via inhalation of a neubulized preparation of DNA vectors in 

liposomes, DNA-protein complexes or replication-deficient adenoviruses (see, e.g., 
U.S. Patent 5,240,846 "Gene Therapy Vector for Cystic Fibrosis". 

This invention also provides for a process to prepare human galactokinase 
proteins. Non-mutant proteins are defined with reference to the amino acid sequence 
10 listed in SEQ ID NO:4 and includes variants with a substantially similar amino acid 
sequence that have the same galactokinase activity. Additional proteins of this 
invention include mutant human galactokinase proteins as set forth in SEQ ED NO: 5 
or 6. The proteins of this invention are preferably made by recombinant genetic 
engineering techniques. The isolated nucleic acids particularly the DNAs can be 
15 introduced into expression vectors by operatively linking the DNA to the necessary 
expression control regions (e.g., regulatory regions) required for gene expression. 
The vectors can be introduced into the appropriate host cells such as prokaryotic 
(e.g., bacterial), or eukaryotic (e.g., yeast or mammalian) cells by methods well 
known in the art (Ausubel et al. t sjipxa). The coding sequences for the desired 
20 proteins having been prepared or isolated, can be cloned into any suitable vector or 
replicon. Numerous cloning vectors are known to those of skill in the an, and the 
selection of an appropriate cloning vector is a matter of choice. Examples of 
recombinant DNA vectors for cloning and host cells which they can transform 
include, but is not limited to, the bacteriophage X (£• CPlD. pBR322 (E- CQli), 
25 pACYC177 (£. coll), pKT230 (gram- negative bacteria), pGVl 106 (gram-negative 
bacteria), pLAFRl (gram-negative bacteria), pME290 (non-E- £oJi gram-negative 
bacteria), pHV 14 (£• fidi and Bacillus sufrnlis), pBD9 (Bacillus), pIJ61 
fSrreptomvces ). pUC6 fSrreptomvces ). YIp5 (Saccharomyces), a baculovirus insect 
cell system, a Drosoohila insect system, and YCpl9 (Saccharpmvces). generally, 
30 "DNA Cloning"; Vols. I & II, Glover ej al. ed. IRL Press Oxford (1985) (1987) and; 
T. Maniatis £1 al. ("Molecular Cloning" Cold Spring Harbor Laboratory (1982). 

The gene can be placed under the control of a promoter, ribosome 
binding site (for bacterial expression) and, optionally, an operator (collectively 
referred to herein as "control" elements), so that the DNA sequence encoding the 
35 desired protein is transcribed into RNA in the host cell transformed by a vector 
containing this expression construction. The coding sequence may or may not 
contain a signal peptide or leader sequence. The subunit antigens of the present 
invention can be expressed using, for example, the E- £oh tac promoter or the protein 
A gene (spa) promoter and signal sequence. Leader sequences can be removed by the 
40 bacterial host in post-translational processing. U.S. Patent Nos. 4,431 ,739; 

4,425,437; 4,338,397. 
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5 In addition to control sequences, it may be desirable to add regulatory 

sequences which allow for regulation of the expression of the protein sequences 
relative to the growth of the host cell. Regulatory sequences are known to those of 
skill in the art, and examples include those which cause the expression of a gene to be 
turned on or off in response to a chemical or physical stimulus, including the presence 
10 of a regulatory compound. Other types of regulatory elements may also be present in 
the vector, for example, enhancer sequences. 

An expression vector is constructed so that the particular coding 
sequence is located in the vector with the appropriate regulatory sequences, the 
positioning and orientation of the coding sequence with respect to the control 
15 sequences being such that the coding sequence is transcribed under the "control" of 
the control sequences (i.e., RNA polymerase which binds to the DNA molecule at the 
control sequences transcribes the coding sequence). Modification of the sequences 
encoding the particular antigen of interest may be desirable to achieve this end. For 
example, in some cases it may be necessary to modify the sequence so that it may be 
20 attached to the control sequences with the appropriate orientation; i.e., to maintain 
the reading frame. The control sequences and other regulatory sequences may be 
ligated to the coding sequence prior to insertion into a vector, such as the cloning 
vectors described above. Alternatively, the coding sequence can be cloned directly 
into an expression vector which already contains the control sequences and an 
25 appropriate restriction site. 

In some cases, it may be desirable to produce other mutants or analogs 
of the galactokinase protein. Mutants or analogs may be prepared by the deletion of a 
portion of the sequence encoding the protein, by insertion of a sequence, and/or by 
substitution of one or more nucleotides within the sequence. Techniques for 
30 modifying nucleotide sequences, such as site-directed mutagenesis, are well known to 
those skilled in the art. T. Maniatis et al., supra; DNA Cloning . Vols. I and 

H, supra; Nucleic Acid Hybridization, supra. 

A number of prokaryotic expression vectors are known in the art. 
e^, U.S. Patent Nos. 4,578,355; 4,440,859; 4,436,815; 4,431,740; 4,431,739; 
35 4,428,941; 4,425,437; 4,418,149; 4,411,994; 4,366,246; 4.342,832; also. U.K. 
Patent Applications GB 2,121,054; GB 2,008,123; GB 2,007,675; and European 
Patent Application 103,395. Yeast expression vectors are also known in the art. 
SL&^ U.S. Patent Nos. 4,446,235; 4,443,539; 4,430,428; £££ also. European Patent 
Applications 103,409; 100,561; 96,491. pSV2neo (as described in J. Mol. App l 
40 Genet- 1:327-341) which uses the SV40 late promoter to drive expression in 

mammalian cells or pCDNA 1 neo, a vector derived from pCDNA 1 (Mol. Cell Biol . 
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5 7:4125-29) which uses the CMV promoter to drive expression. Both these latter two 
vectors can be employed for transient or stable (using G418 resistance) expression in 
rnammalian cells. Insect cell expression systems, e.g., Drosophila . are also useful, see 
for example, PCT applications WO 90/06358 and WO 92/06212 as well as EP 
290,26 1-B1. 

10 Depending on the expression system and host selected, the proteins of 

the present invention are produced by growing host cells transformed by an 
expression vector described above under conditions whereby the protein of interest is 
expressed. Preferred mammalian cells include human embryonic kidney cells, monkey 
kidney (HEK-293cells), fibroblast (COS) cells, Chinese hamster ovary (CHO) cells, 

15 Drosophila or murine L-cells. If the expression system secretes the protein into 
growth media, the protein can be purified directly from the media. If the protein is 
not secreted, it is isolated from cell lysates or recovered from the cell membrane 
fraction. The selection of the appropriate growth conditions and recovery methods 
are within the skill of the art. 

20 An alternative method to identify proteins of the present invention is 

by constructing gene libraries, using the resulting clones to transform fL coli and 
pooling and screening individual colonies using polyclonal serum or monoclonal 
antibodies to galactokinase. 

The proteins of the present invention may also be produced by 

25 chemical synthesis such as solid phase peptide synthesis, using known amino acid 

sequences or amino acid sequences derived from the DNA sequence of the genes of 
interest. Such methods are known to those skilled in the an. Chemical synthesis of 
peptides is not particularly preferred. 

The proteins of the present invention or their fragments comprising at 

30 least one epitope can be used to produce antibodies, both polyclonal and monoclonal. 
If polyclonal antibodies are desired, a selected mammal, (e.g., mouse, rabbit, goat, 
horse, etc.) is immunized with the protein of the present invention, or a fragment 
thereof, capable of eliciting an immune response (i.e., having at least one epitope). 
Serum from the immunized animal is collected and treated according to known 

35 procedures. If serum containing polyclonal antibodies is used, the polyclonal 
antibodies can be purified by immunoaffinity chromatography or other known 
procedures. 

Monoclonal antibodies to the proteins of the present invention, and to 
the fragments thereof, can also be readily produced by one skilled in the an. The 
40 general methodology for making monoclonal antibodies by using hybridoma 

technology is well known. Immortal antibody-producing cell lines can be created by 
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5 cell fusion, and also by other techniques such as direct transformation of B 

lymphocytes with oncogenic DNA, or transfection with Epstein-Barr virus Sju £^ 
M. Schreieraal.. "Hybridoma Techniques" (1980); Hammerlingejal., "Monoclonal' 
Antibodies and T-ccll Hybridomas" (1981); Kennett a 4, "Monoclonal Antibodies" 
(1980); SEC also. U.S. Patent Nos. 4.341,761; 4,399,121; 4,427 783- 4 444 887 
10 4,452.570; 4,466,917; 4,472,500; 4.491,632; and 4,493.890. Panels of monoclonal 
antibodies produced against the antigen of interest, or fragment thereof, can be 
screened for various properties; i.e., for isotype, epitope, affinity, etc. Hence one 
stalled in the art can produce monoclonal antibodies specifically reactive with mutant 
galactokinase proteins, e.g., the missense mutation of SEQ ID NO S or nonsense 
15 mutation of SEQ ID NO:6. Monoclonal antibodies are useful in purification using 
immunoaffinity techniques, of the individual antigens which they are directed against 
Alternatively, genes encoding the monoclones of interest may be isolated from the 
hybndomas by PCR techniques known in the an and cloned and expressed in the 
appropriate vectors. The antibodies of this invention, whether polyclonal or 
monoclonal have additional utility in that they may be employed reagents in 
immunoassays, RIA, ELISA, and the like. As used herein, "monoclonal antibody" is 
understood to include antibodies derived from one species (e.g., murine, rabbit, goat 
rat, human, etc.) as well as antibodies derived from two (or perhaps more) species 
(e.g., chimeric and humanized antibodies). 

Chimeric antibodies, in which non-human variable regions are joined or fused 
to human constant regions to, ^. Liu et al., Proc. Natl AraH ^ m 84,3439 
(1987)), may also be used in assays or therapeutically. Preferably, a therapeutic 
monoclonal antibody would be "humanized" as described in Jones et al., Nlitmi, 
321:522 (1986); Verhoeyen et al.; Science.. 239:1534 (1988); Kabatct al..L 
Immunol, 147:1709 (1991); Queen et al.. Proc. Na,l AraH ^ it^ a 86:10029 
(1989); Gorman et al.. Proc, Natl Acad, Scj USA, 88:34181 (1991); and Hodgson et 
al.. Bjo^rechpolpgy . 9:421 (1991). Therefore, this invention also contemplates 
antibodies, polyclonal or monoclonal (including chimeric and "humanized") directed 
to epitopes corresponding to amino acid sequences disclosed herein from human 
galactokinase. Methods for the production of polyclonal and monoclonal antibodies 
are well known, see for example Chap. 1 1 of Ausubel et al. (supra). 

When the antibody is labeled with an analytically detectable reagent such a 
radioactivity, fluorescence, or an enzyme, the antibody can be use to detect the 
presence or absence of human galactokinase and/or its quantitative level. In addition, 
antibodies (polyclonal or monoclonal) specific for the missense and nonsense 
mutations of the present invention are useful for diagnostic purposes. A serum or 
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5 tissue sample (e.g., liver, lung, etc.) is obtained and allowed to come in contact with 
an antibody or antibody fragment which specifically binds to a mutant human 
galactokinase protein of the invention under conditions such that an antigen- 
antibody complex is formed between said antibody (or antibody fragment) and said 
mutant galactokinase protein. The detection for the presence or absence of said 

10 complex is within the skill of the art (e.g., ELISA, RIA, Western Blotting, Optical 
Biosensor (e.g., BIAcore - Pharmacia Biosensor, Uppsala, Sweden) and do not limit 
this invention. 

This invention also contemplates pharmaceutical compositions 
comprising an effective amount of the galactokinase protein of the invention and a 
15 pharmaceutically acceptable carrier. Pharmaceutical compositions of proteinaceous 
drugs of this invention are particularly useful for parenteral administration, i.e., 
subcutaneously, intramuscularly or intravenously. Optionally, the galactokinase 
protein is surrounded by a membrane bound vesicle, such as a liposome. 

The compositions for parenteral administration will commonly comprise a 
20 solution of the compounds of the invention or a cocktail thereof dissolved in an 

acceptable carrier, preferably an aqueous carrier. A variety of aqueous carriers may 
be employed, e.g., water, buffered water, 0.4% saline, 0.3% glycine, and the like. 
These solutions are sterile and generally free of particulate matter. These solutions 
may be sterilized by conventional, well known sterilization techniques. The 
25 compositions may contain pharmaceutically acceptable auxiliary substances as 

required to approximate physiological conditions such as pH adjusting and buffering 
agents, etc. The concentration of the compound of the invention in such 
pharmaceutical formulation can very widely, i.e., from less than about 0.5%, usually 
at or at least about 1% to as much as 15 or 20% by weight and will be selected 
30 primarily based on fluid volumes, viscosities, etc., according to the particular mode of 
administration selected. 

Thus, a pharmaceutical composition of the invention for intramuscular 
injection could be prepared to contain 1 mL sterile buffered water, and 50 mg of a 
compound of the invention. Similarly, a pharmaceutical composition of the invention 
35 for intravenous infusion could be made up to contain 250 ml of sterile Ringer's 

solution, and 150 mg of a compound of the invention. Actual methods for preparing 
parenterally administrate compositions are well known or will be apparent to those 
skilled in the art and are described in more detail in, for example, Remington's 
Pharmaceutical Science , 15th ed., Mack Publishing Company, Easton, Pennsylvania. 
40 The compounds described herein can be lyophilized for storage and 

reconstituted in a suitable carrier prior to use. This technique has been shown to be 
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5 effective with conventional proteins and art-known lyophilization and reconstitution 
techniques can be employed. 

The physician will determine the dosage of the present therapeutic agents 
which will be most suitable and it will vary with the form of administration and the 
particular compound chosen, and furthermore, it will vary with the particular patient 
10 under patient under treatment. He will generally wish to initiate treatment with small 
dosages substantially less than the optimum dose of the compound and increase the 
dosage by small increments until the optimum effect under the circumstances is 
reached. It will generally be found that when the composition is administered orally, 
larger quantities of the active agent will be required to produce the same effect as a ' 
smaller quantity given parenterally. The therapeutic dosage will generally be from 1 
to 10 milligrams per day and higher although it may be administered in several 
different dosage units. 

Depending on the patient condition, the pharmaceutical composition of the 
invention can be administered for prophylactic and/or therapeutic treatments. In 
therapeutic application, compositions are administered to a patient already suffering 
from a disease in an amount sufficient to cure or at least partially arrest the disease 
and its complications. In prophylactic applications, compositions containing the 
present compounds or a cocktail thereof are administered to a patient not already in a 
disease state to enhance the patient's resistance. 

Single or multiple administrations of the pharmaceutical compositions can be 
carried out with dose levels and pattern being selected by the treating physician. In 
any event, the pharmaceutical composition of the invention should provide a quantity 
of the compounds of the invention sufficient to effectively treat the patient. 

This invention also contemplates use of the galactokinase genes of the instant 
30 invention as a diagnostic. For example, some diseases result from inherited 

defective genes. These genes can be detected by comparing the sequence of the 
defective gene with that of a normal one. Subsequently, one can verify that a 
"mutant" gene is associated with galactokinase deficiency by measurement of 
galactose. That is, a mutant gene would be associated with (atypically) elevated 
35 levels of galactose in a patient. In addition, one can insert mutant galactokinase 
genes into a suitable vector for expression in a functional assay system (e.g., 
colorimetric assay, expression on MacConkey plates, complementation experiments, 
e.g, in a galactokinase deficient strain of yeast or E. coli) as yet another means to 
verify or identify galactokinase mutations. As an example, RNA from an individual 
40 can be transcribed with reverse transcriptase to cDNA which can then be amplified 
by polymerase chain reaction (PCR), cloned into an E. coli expression vector, and 

15 
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5 transformed into a galactokinase-deficiem strain of E. coli. When grown on 

MacConkey indicator plates, galactokinase-deficient cells will produce colonies that 
are white in color, whereas cells that have been transformed/complemented with a 
functional galactokinase gene will be red (see, e.g., Examples section). If most to 
all of the colonies from an individual are red, then the individual is considered to be 

10 normal with respect to galactokinase activity. If approximately 50% of the colonies 
are red (the other 50% white), then that individual is likely to be a carrier for 
galactokinase deficiency. If most to all of the colonies are white, then that 
individual is likely to be galactokinase deficient. Once "mutant" genes have been 
identified, one can then screen the population for carriers of the "mutant" 

15 galactokinase gene. (A carrier is a person in apparent health whose chromosomes 
contain a "mutant" galactokinase gene that may be transmitted to that person's 
offspring.) In addition, monoclonal antibodies that are specific for the mutant 
galactokinase proteins can be used for diagnostic purposes as described above. 
Individuals carrying mutations in the human galactokinase gene may be 

20 detected at the DNA level by a variety of techniques. Nucleic acids used for 

diagnosis (genomic DNA, mRNA, etc.) may be obtained from a patient's cells, such 
as from blood, urine, saliva, tissue biopsy (e.g., chorionic villi sampling or removal 
of amniotic fluid cells), and autopsy material. The genomic DNA may be used 
directly for detection or may be amplified enzymatically by using PCR, ligase chain 

25 reaction (LCR), strand displacement amplification (SDA), etc. (see, e.g., Saiki et al., 
Maiure, 224:163-166 (1986), Bej, et al., Pit. Rev. Biochem. Molec. Biol. . 2£:301- 
334 (1991), Birkenmeyer et ah. I Virol. Meth. . 25:117-126 (1991), Van Brunt, J., 
Bio/Technologv . 3:291-294 (1990)) prior to analysis. RNA may also be used for 
the same purpose. The RNA can be reverse- transcribed and amplified at one time 

30 with PCR-RT (polymerase chain reaction - reverse transcriptase) or reverse- 
transcribed to an unamplified cDNA. As an example, PCR primers complementary 
to the nucleic acid of the instant invention can be used to identify and analyze 
galactokinase mutations. For example, deletions and insertions can be detected by a 
change in size of the amplified product in comparison to the normal galactokinase 

35 genotype. Point mutations can be identified by hybridizing amplified DNA to 

radiolabeled galactokinase RNA (of the invention) or alternatively, radiolabelled 
galactokinase antisense DNA sequences (of the invention). Perfectly matched 
sequences can be distinguished from mismatched duplexes by RNase A digestion or 
by differences in melting temperatures (Tm). Such a diagnostic would be particularly 

40 useful for prenatal and even neonatal testing. 
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5 In addition, point mutations and other sequence differences between the 

reference gene and "mutant" genes can be identified by yet other well-known 
techniques, e.g., direct DNA sequencing, single-strand conformational 
polymorphism (SSCP; Orita et al.. fisnflmifiS, 5:874-879 (1989)). For example a 
sequencing primer is used with double-stranded PCR product or a single-stranded 
10 template molecule generated by a modified PCR. The sequence determination is 
performed by conventional procedures with radiolabeled nucleotides or by 
automatic sequencing procedures with fluorescent-tags. Cloned DNA segments may 
also be used as probes to detect specific DNA segments. The sensitivity of this 
method is greatly enhanced when combined with PCR. The presence of nucleotide 
15 repeats may correlate to a change in galactokinase activity (causative change) or 
serve as marker for various polymorphisms. 

Genetic testing based on DNA sequence differences may be achieved by 
detection of alteration in electrophoretic mobility of DNA fragments in gels with or 
without denaturing agents. Small sequence deletions and insertions can be 
visualized by high resolution gel electrophoresis. DNA fragments of different 
sequences may be distinguished on denaturing formamide gradient gels in which the 
mobilities of different DNA fragments are retarded in the gel at different positions 
according to their specific melting or partial melting temperatures (see. e.g Myers 
et al.. Seism*. 220:1242 (198 5)). In addition, sequence alterations, in particular 
small deletions, may be detected as changes in the migration pattern of DNA 
heteroduplexes in non-denaturing gel electrophoresis (i.e., heteroduplex 
electrophoresis) (see. e.g., Nagamine et al.. Am. J. Hnm <w. 4J 337-339 
(1989)). 

Sequence changes at specific locations may also be revealed by nuclease 
protection assays, such as RNase and SI protection or the chemical cleavage method 
(e.g., Cotton etaLProc. Natl a^h <h tt^ ». ir 1ln , (19g5)) 

Thus, the detection of a specific DNA sequence may be achieved by methods 
such as hybridization (e.g.. heteroduplex electroporation, see. White et al 
QgnOTli ^ ^01-306 (1992). RNAse protection (e.g.. Myers et al.. Ste 
220:1242 (1985)) chemical cleavage (e.g., Cotton et al.. Proc. Natl Ar-aH ^ r » 
£5:4397-4401 (1985))). direct DNA sequencing, or the use of restriction enzymes 
(e.g.. restnction fragment length polymorphisms (RFLP) in which variations in the 
number and size of restriction fragments can indicate insertions, deletions, presence 
of nucleotide repeats and any other mutation which creates or destroys an 
endonuclease restriction sequence). Southern blotting of genomic DNA may also be 
used to identify large (i.e., greater than 100 base pair) deletions and insenions 
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5 In addition to more conventional gel-electrophoresis, and DNA sequencing, 

mutations (e.g., microdeletions, aneuploidies, translocations, inversions) can also be 
detected by in situ analysis (See, e.g., Keller et al., DNA Probes, 2nd Ed., Stockton 
Press, New York, N.Y., USA (1993)). That is, DNA (or RNA) sequences in cells 
can be analyzed for mutations without isolation and/or immobilization onto a 

10 membrane. Fluorescence in situ hybridization (FISH) is presently the most 

commonly applied method and numerous reviews of FISH have appeared. See, e.g., 
Trachuck et al., Sskflce., 25Q:559-562 (1990), and Trask et al., Trends. Genet.. 2: 
149-154 (1991) which are incorporated herein by reference for background 
purposes. Hence, by using nucleic acids based on the structure of specific genes, 

15 e.g., galactokinase, one can develop diagnostic tests for galactokinase deficiency. 

In addition, some diseases are a result of, or are characterized by, changes in 
gene expression which can be detected by changes in the mRNA. Alternatively, the 
galactokinase gene can be used as a reference to identify individuals expressing a 
decreased level of galactokinase, e.g., by Northern blotting or in situ hybridization. 

20 Defining appropriate hybridization conditions is within the skill of the art. 

See , e.g.. "Current Protocols in Mol. Biol." Vol. I & II, Wiley Interscience. Ausbel 
ej al. (ed.) (1992). Probing technology is well known in the an and it is appreciated 
that the size of the probes can vary widely but it is preferred that the probe be at least 
15 nucleotides in length. It is also appreciated that such probes can be and are 

25 preferably labeled with an analytically detectable reagent to facilitate identification of 
the probe. Useful reagents include but are not limited to radioactivity, fluorescent 
dyes or enzymes capable of catalyzing the formation of a detectable product. As a 
general rule the more stringent the hybridization conditions the more closely related 
genes will be that are recovered. 

30 Also within the scope of this invention are antisense oligonucleotides 

predicated upon the sequences disclosed herein for human galactokinase. Synthetic 
oligonucleotides or related antisense chemical structural analogs are designed to 
recognize and specifically bind to a target nucleic acid encoding galactokinase and 
galactokinase mutations. The general field of antisense technology is illustrated by 

35 the following disclosures which are incorporated herein by reference for purposes of 
background (Cohen, J.S., Trends in Pharm. Sci. . 10:435(1989) and Weintraub, H.M 
Scientific American . Jan.(1990) at page 40). 

Transgenic, non-human, animals may be obtained by transfecting appropriate 
fertilized eggs or embryos of a host with nucleic acids encoding human galactokinase 

40 disclosed herein, see for example U.S. Patents 4,736,866; 5,175,385; 5,175,384 and 
5,175,386. The resultant transgenic animal may be used as a model for the study of 
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galactokinase. Panicularly. useful transgenic animals are those which display a 
detectabie phenotype associated with the expression of the receptor. Drugs may then 
be screened for thetr ability to reverse or exacerbate the relevant phenotype This 
^vennon also contemplates operatively linking the receptor coding gene to 
regulatory elements which are differentially responsive to various temperature or 
metabohc conditions, thereby effectively turning on or off the phenotyp.c expression 
in response to those conditions. 

Although not necessarily limiting of this invention, following are some 
experimental data illustrative of this invention. 
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EXAMPT.ir j 



Galactokinase (galK) was obtained from human placenta as described by 
Stambohan et al. (E^nim Pjophv, V m. 211:306-312 (1985)). which is incorporated 
by reference ,„ its entirety. In essence, human placenta t:ssue (obtained within 1 hour 
of panunaon) was homogenized, centrifuged and the resulting supernatant was 
absorbed onto DEAE-Sephacel®. T*e material was eluted. precipitated with 
ammonium su.fate and then run through a sizing column (Sephadex G- 100 SF®) 
Pooled active fracdons were concentrated. Purified protein was obtained following 
separator, by SDS po.yacry,amide electrophoresis and then Western blotted using 
standard techniques (see. Laemmli, Mainrs, 222:680-685 (1970). or LeGendre et al 
Bwzzhmw^ 15:154 (1988)). Minute amounts of galactokinase were isolated " 
(myograms) from multiple rounds of protein purification. After a trypsin peptide 
digest. 7 peptide sequences were eventually isolated and identified. The three longest 
JU fragments are presented below: 
[SEQID NO: 1] 

Val Asn Leu He Gly Glu His Thr Asp Tyr Asn Gin Gly Leu VaJ Leu- 
Pro Met Ala Leu Glu Leu Met Thr Val Leu Val Gly Ser Pro Arg 

35 [SEQ ID NO:2] 

His He Gin Glu His Tyr Gly Gly Thr Ala Thr Phe Tyr Leu Ser Gin- 
Ala Ala Asp Gly Ala Lys 



[SEQIDNO:3] . 
Ala Gin Val Cys Gin Gin Ala Glu His Ser Phe Ala Gly Met Pro Cys- 
Gly lie Met Asp Gin Phe He Ser Leu Met Gly Gin Lys 
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5 The fragments were compared with peptide sequences encoded by cDNAs, in 

which the cDNAs were partially sequenced. The cDNAs (also known as expressed 
sequence tags or ESTs) were obtained from Human Genome Sciences, Inc. 
(RockviUe, MD, USA). The best alignments occurred with an EST sequence from a 
human osteoclastoma stromal cell library (SEQ ID NO:l showed 100% identity over 

10 18 contiguous amino acids) and an EST sequence from a human pituitary library (SEQ 
ID NO:2 showed 95.5% identity over 22 contiguous amino acids). A full-length 
cDNA from the human osteoclastoma stromal cell library was identified and 
sequenced (SEQ ID NO:4) in its entirety on an automated ABI 373A Sequencer. 
Sequencing was confirmed on both strands. The corresponding amino acid sequence 

15 (SEQ ID NO:4) was compared against the peptide fragments identified above. SEQ 
ID NO: 1 corresponds to amino acids 38-68 of the full-length human galactokinase 
protein. Similarly, SEQ ID NOs: 2 and 3 correspond to amino acids 367-388 and 
167-195, respectively, of human galactokinase. 

20 A n^Y^* of thft Human Qalaqnfr inase Oene: 

A comparison of the amino acid sequence for human galactokinase with that of 
£. coli galactokinase (Debouck et al., ACHl Res.. 11:1841-1853 (1985)) shows 
61% similarity and 44.5% identity. Further comparison with another purported 
human galactokinase gene (GK2) (Lee et al., ?joc Natl Affld, Stf . USA . £2: 10887- 
25 10891 (1992)) shows 54% similarity and 34.6% identity at the amino acid level. 

Furthermore, the GK2 gene maps to human chromosome 15 which is in contrast to 
the gene of the present invention which maps to human chromosome 17, position q24 
as determined by fluorescence in situ hybridization (FISH) analysis. 

SEQ ID NO:4 was hybridized against a Northern blot containing human 
30 messenger RNA from placenta, brain, skeletal muscle, kidney, intestine, heart, lung 
and liver according to standard procedures (see, e.g., Sambrook et al., Molecular 
gnmnpr A Laborarnrv Manual . 2nd Ed., Cold Spring Harbor Laboratory Press, 
1989). Hybridization was strongest with human liver and lung tissue. 

35 rfrlactokinas f rnrpplementation; 

SEQ ID NO:4 was subcloned into an £. coli vector, plasmid pBluescnpt 
[Stratagene]. When transformed into C600K-, a galactokinase-deficient strain, the 
transformed E. coli grew.on MacConkey agar plates containing 1% galactose (and 
ampicillin @ 50ug/ml for plasmid selection), and produced brick red colonies, 

40 indicating sugar fermentation. Specifically, the red color is due to the action of acids, 
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5 produced by galactose fermentation, upon bile salts and the indicator (neutral red) in 
MacConkey medium. 

Expression in Mammalian Cells: 

SEQ ID NO:4 was also subcloned into COS-1 cells [ATCC CRL 1650], The 
10 cells were transfected, grown, and cell lysates were prepared. The lysates were 

assayed by a 14 C galactokinase assay as described by Stambolian et al. fExp. Eve Res 
2g:23 1-237 (1984)) which is hereby incorporated by reference in its entirety. When 
expressed in transiently transfected COS cells, galactokinase activity was tenfold 
higher than control levels (6600 vs. 640 counts per minute - repeated three times). 
15 These results definitively confirm that SEQ ID NO:4 encodes a full-length, 
biologically active, human galactokinase gene. 

The nucleic acid molecule of the invention can also be subcloned into an 
expression vector to produce high levels of human galactokinase (either fused to 
another protein, e.g., operatively linked at the 5' end with another coding sequence, or 
20 unfused) in transfected cells. For mammalian cells, the expression vector would 

optionally encode a neomycin resistance gene to select for transfectants on the basis of 
ability to grow in G418 and a dihydrofolate reductase gene which permits 
amplification of the transfected gene in DHFR" cells. The plasmid can then be 
introduced into host cell lines e.g., CHO ACC98, a nonadherent, DHFR- cell line 
25 adapted to grow in serum free medium, and human embryonic kidney 293 cells 
(ATCC CRL 1573), and transfected cell lines can be selected by G418 resistance. 

Human Galactokinase Gen e - Genomic Sequence: 

A full-length galactokinase genomic gene coding region was identified from a 
30 lambda phage (X Fix II) human genomic library (made from human placenta tissue) 
using the galK cDNA as a probe. One isolate, designated clone 17 was deposited on 

3 May 1995, with the American Type Culture Collection (ATCC), Rockville, MD, 
USA, under accession number ATCC 97135, and has been accepted as a patent 
deposit, in accordance with the Budapest Treaty of 1977 governing the deposit of 

35 microorganisms for the purposes of patent procedure. 

The genomic gene coding region is divided into at least 8 exons isolated from 

4 DNA fragments. The arrangement is depicted in Figure 1. The DNA sequence was 
determined by using multiple oligonucleotide PCR primers corresponding to the galK 
cDNA sequence (i.e., corresponding to galK genomic exons) as well as 

40 oligonucleotide PCR primers subsequently designed that correspond to non-coding 
regions (i.e., galK genomic introns). Thus the structure of the galactokinase genomic 
gene is summarized in Table 1 below (see also Figure 2 and SEQ ID NO:7]): 
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Table 1 

Genomic Galactokinase Gene 



Exon # 


Amino Acids 
Encoded 


PCR Primer #/ 
[SEQ ED NOl 


1 


1-55 


3333/[8] 
3334/[9] 
3598/T 101 
3599/[ll] 


2 


56-118 


1 8fi8/n 7i 

3332/[13] 
3604/[14] 
360VF1 SI 


3 


119-158 


3331/[16] 
3606/H71 

J UUU/ [ 1 / J 


4 


159-204 


1657/[18] 
3034/H91 


5 


205-264 


3330/[20] 
3607/[21] 


6 


265-315 


1539/[22] 
2665/[23] 


7 


316-369 


1891/[24] 
2665/[25] 


8 


370-392 


2665/[26] 
2666/[27] 
2667/[28] 



10 



Galactokinase Deficiency Marker/Gene: 

A fibroblast cell line (GM00334), derived from a patient with galactokinase 
15 deficiency, was obtained from the Coriell Institute for Medical research, 401 Haddon 

22 
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DNA a u!f ' ° f (Bi0teCX ' H ° USt0n ' Tx >- ^P^mic 

DNA ( ug) was reversed transcribed with ol.gonucleotide primers 1 823 [SEQ ID 
NO: 29] and 1825 [SEQ m NO: 30 , The sampie was amplified by 35 cycZ^r 
for 1 rmn 60 ° C for 1 mi, and 72°C for 7 mi, The DNA product was piffified 

Twelv cDNAs „ total were sequenced (representing cloned PGR products of 

" m T rcactions) - ™ s procedure - - ^ 

fibroblasts from normal controls (i.e., persons not exhibiting gaiactokinase deficiency) 
A companson with normal controls identified a single base substitution of A 
15 for G at posmon 122 of the "normal" human gaiactokinase gene [SEQ ID NO 

Se G io aT T" mUtati ° n " "** ^ 32 fr ° m Val » M « [SEQ ID NO 5, 
TCGicCA, ^ ^ 3 MSCI end0nucl — cdon site (i.e. 

TGG4-CCA) on the mutant allele. Tnis restriction site was then used to rapidly screen 
for the mutant allele in the parents of the patient with galac.ok.nase deficiency „ 

was cloned from a genom.c lambda phage library and its DNA sequence was 
deterrmned, mcluding a portion of the flanking intron sequences. Oligonucleotide 
pnmers (X2 SOUT [SEQ ID NO: 31] and X2-30UT [SEQ ID NO: 32] 

25 2 S *° SCqUCnCeS ^ am P' ificatio " *■ 346 bp DNA 

25 fragment of the genomic DNA. The PGR product was analyzed for the point 

mutanon v,a RFLP. *„ i, the presence of a newly created MscI site as Zted by 

enzyme MscI. and thus migrates as a 346bp fragment on an agarose ge. The PGR 
product from the patient with gaiactokinase defic.ency (i e the G to A h „ 
30 cleaved with Msc!, resulting in two fragments of M^^^g « 

r::;: pgr p indica,cs 1,131 the paticm was h ~* - ^ 

in contrast, PGR products from the parents of this patient, followed by a MscI 
d^estton. resulted in three fragments (346, 193 and .53 bp) wh.ch is consistent with a 

35 :^~ em for the G ,o A base change - That - the — 

j j earners of the same mutation. 

acuvitv T rc d DNrr whcther ,he ™ ssense mutation resu,ted in 

acuity a cDNA clone containing the G to A base change was subcloned into COS 
cel.s and assayed for gaiactokinase activity as previously described. COS cells 
transited with cDNA encoding the missense mutation had the same level of 

COS cells transfected w.th the non-mu.ant gaiactokinase cDNA [SEQ ID NO-4] had a 
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5 fifty-fold higher activity compared to the host COS cells (i.e., control). This results 
supports the VaP 2 to M e t 32 substitution as the cause of the decreased enzymatic 
activity. 

Another mutation was discovered in an unrelated patient having cataracts and 
diagnosed as galactokinase deficient (galactokinase activity was found to be close to 
10 zero). Genomic DNA was isolated from lymphoblastoid cell lines and sequenced by 
automated sequencing on an ABI 373A sequencer. A single base substitution of T for 
G resulted in an in-frame nonsense codon (i.e., TAG) at amino acid position 80 [SEQ 
ID NO:6]. This mutation causes premature termination of human galactokinase, 
resulting in a truncated protein of 79 amino acids that would be expected to be non- 
15 functional. (The genomic DNA of the parents of this patient were heterozygous for 
this mutation, and hence not galactokinase deficient.) 



The above description and examples fully disclose the invention including 
20 preferred embodiments thereof. Those skilled in the art will recognize, or be able to 
ascertain using no more than routine experimentation, many equivalents to the specific 
embodiments herein. Such equivalents are intended to be within the scope of the 
following claims. 



nH O O O CO: <WO_jM0a3NA1JL> 



24 



5 



10 



20 



25 



30 



35 



WO 96/09374 

PCT/US95/06743 



SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: Bergsma, Derk J. 

Stambolian, Dwight 



(ii) TITLE OF INVENTION: Human Galactokinase Gene 
15 (iii) NUMBER OF SEQUENCES: 32 

(iv) CORRESPONDENCE ADDRESS : 

(A) ADDRESSEE: SmithKline Beecham Corp . /Corporate 
Intellectual Property 

(B) STREET : 709 Swedeland Road/UW2220 

(C) CITY: King of Prussia 

(D) STATE: Pennsylvania 

(E) COUNTRY : USA 

(F) ZIP: 19406-0939 



(v) COMPUTER READABLE FORM : 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

<C) OPERATING SYSTEM : PC-DOS /MS-DOS 

(D) SOFTWARE: Patent In Release #1.0, Version #1.30 

(vi) CURRENT APPLICATION DATA: 

(A) APPLICATION NUMBER: 

(B) FILING DATE: 
<C) CLASSIFICATION: 



(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: PCT/US 94 / 1 0825 

(B) FILING DATE: 23-SEP-1994 

40 

(viii) ATTORNEY/ AGENT INFORMATION: 

(A) NAME: Sutton, Jeffrey A. 

(B) REGISTRATION NUMBER: 34,028 

<C) REFERENCE /DOCKET NUMBER: P50268-1 
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5 



10 



20 



25 



(ix) TELECOMMUNICATION INFORMATION; 

(A) TELEPHONE: 610-270-5024 

(B) TELEFAX: 610-270-5090 



(2) INFORMATION FOR SEQ ID NO : 1 : 



(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 31 amino acids 
15 (B) TYPE: amino acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: protein 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 : 

Val Asn Leu lie Gly Glu His Thr Asp Tyr Asn Gin Gly Leu Val Le 
i 5 10 15 



Pro Met Ala Leu Glu Leu Met Thr Val Leu Val Gly Ser Pro Arg 
30 20 25 30 

(2) INFORMATION FOR SEQ ID NO : 2 : 

(i) SEQUENCE CHARACTERISTICS : 
35 (A) LENGTH: 22 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

40 (ii) MOLECULE TYPE: protein 
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5 (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 : 



His lie Gin Glu His Tyr Gly Gly Thr Ala Thr Phe Tyr Leu Ser Gin 

15 



1 5 io 



Ala Ala Asp Gly Ala Lys 
20 

(2) INFORMATION FOR SEQ ID NO : 3 : 



15 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 2 9 amino acids 

(B) TYPE: amino acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii> MOLECULE TYPE: protein 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 : 

Ala Gin Val Cys Gin Gin Ala Glu His Ser Phe Ala Gly Met Pro Cys 

Gly He Met Asp Gin Phe He Ser Leu Met Gly Gin Lys 
20 25 

(2) INFORMATION FOR SEQ ID NO : 4 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1349 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: cDNA 
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5 (ix) FEATURE: 

(A) NAME/KEY: CDS 
<B> LOCATION: 29.. 1204 



10 (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 4 : 

GAATTCGGCA CGAGTGCAGG CGCGCGTC ATG GCT GCT TTG AGA CAG CCC CAG 
52 

Met Ala Ala Leu Arg Gin Pro Gin 
15 i 5 

GTC GCG GAG CTG CTG GCC GAG GCC CGG CGA GCC TTC CGG GAG GAG TTC 
100 

Val Ala Glu Leu Leu Ala Glu Ala Arg Arg Ala Phe Arg Glu Glu Phe 
20 10 15 20 

GGG GCC GAG CCC GAG CTG GCC GTG TCA GCG CCG GGC CGC GTC AAC CTC 
148 

Gly Ala Glu Pro Glu Leu Ala Val Ser Ala Pro Gly Arg Val Asn Leu 

25 25 30 35 40 

ATC GGG GAA CAC ACG GAC TAC AAC CAG GGC CTG GTG CTG CCT ATG GCT 
196 

lie Gly Glu His Thr Asp Tyr Asn Gin Gly Leu Val Leu Pro Met Ala 

30 45 50 55 

CTG GAG CTC ATG ACG GTG CTG GTG GGC AGC CCC CGC AAG GAT GGG CTG 
244 

Leu Glu Leu Met Thr Val Leu Val Gly Ser Pro Arg Lys Asp Gly Leu 

35 60 65 -70 

GTG TCT CTC CTC ACC ACC TCT GAG GGT GCC GAT GAG CCC CAG CGG CTG 
292 

Val Ser Leu Leu Thr Thr Ser Glu Gly Ala Asp Glu Pro Gin Arg Leu 

40 75 80 85 

CAG TTT CCA CTG CCC ACA GCC CAG CGC TCG CTG GAG CCT GGG ACT CCT 
340 

Gin Phe Pro Leu Pro Thr Ala Gin Arg Ser Leu Glu Pro Gly Thr Pro 
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5 90 95 ioo 

CGG TGG GCC AAC TAT GTC AAG GGA GTG ATT CAG TAC TAC CCA GCT GCC 
388 

Arg Trp Ala Asn Tyr Val Lys Gly Val lie Gin Tyr Tyr Pro Ala Ala 

10 105 HO 115 120 

CCC CTC CCT GGC TTC AGT GCA GTG GTG GTC AGC TCA GTG CCC CTG GGG 
436 

Pro Leu Pro Gly Phe Ser Ala Val Val Val Ser Ser Val Pro Leu Gly 
125 130 135 

GGT GGC CTG TCC AGC TCA GCA TCC TTG GAA GTG GCC ACG TAC ACC TTC 
484 

Gly Gly Leu Ser Ser Ser Ala Ser Leu Glu Val Ala Thr Tyr Thr Phe 

20 140 145 150 

CTC CAG CAG CTC TGT CCA GAC TCG GGC ACA ATA GCT GCC CGC GCC CAG 
532 

Leu Gin Gin Leu Cys Pro Asp Ser Gly Thr He Ala Ala Arg Ala Gin 
25 155 160 i 6 5 

GTG TGT CAG CAG GCC GAG CAC AGC TTC GCA GGG ATG CCC TGT GGC ATC 
580 

Val Cys Gin Gin Ala Glu His Ser Phe Ala Gly Met Pro Cys Gly lie 
30 110 175 180 

ATG GAC CAG TTC ATC TCA CTT ATG GGA CAG AAA GGC CAC GCG CTG CTC 
628 

Met Asp Gin Phe He Ser Leu Met Gly Gin Lys Gly His Ala Leu Leu 

35 185 190 195 20Q 

ATT GAC TGC AGG TCC TTG GAG ACC AGC CTG GTG CCA CTC TCG GAC CCC 
676 

He Asp Cys Arg Ser Leu Glu Thr Ser Leu Val Pro Leu Ser Asp Pro 
40 205 210 2 i5 

AAG CTG GCC GTG CTC ATC ACC AAC TCT AAT GTC CGC CAC TCC CTG GCC 
724 

Lys Leu Ala Val Leu lie Thr Asn Ser Asn Val Arg His Ser Leu Ala 
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5 220 225 230 

TCC AGC GAG TAC CCT GTG CGG CGG CGC CAA TGT GAA GAA GTG GCC CGG 
772 

Ser Ser Glu Tyr Pro Val Arg Arg Arg Gin Cys Glu Glu Val Ala Arg 

10 235 240 245 

GCG CTG GGC AAG GAA AGC CTC CGG GAG GTA CAA CTG GAA GAG CTA GAG 
820 

Ala Leu Gly Lys Glu Ser Leu Arg Glu Val Gin Leu Glu Glu Leu Glu 

15 250 255 260 

GCT GCC AGG GAC CTG GTG AGC AAA GAG GGC TTC CGG CGG GCC CGG CAC 
868 

Ala Ala Arg Asp Leu Val Ser Lys Glu Gly Phe Arg Arg Ala Arg His 

20 265 270 275 280 

GTG GTG GGG GAG ATT CGG CGC ACG GCC CAG GCA GCG GCC GCC CTG AGA 
916 

Val Val Gly Glu lie Arg Arg Thr Ala Gin Ala Ala Ala Ala Leu Arg 

25 285 290 295 

CGT GGC GAC TAC AGA GCC TTT GGC CGC CTC ATG GTG GAG AGC CAC CGC 
964 

Arg Gly Asp Tyr Arg Ala Phe Gly Arg Leu Met Val Glu Ser His Arg 

30 300 305 310 

TCA CTC AGA GAC GAC TAT GAG GTG AGC TGC CCA GAG CTG GAC CAG CTG 
1012 

Ser Leu Arg Asp Asp Tyr Glu Val Ser Cys Pro Glu Leu Asp Gin Leu 

35 315 320 325 

GTG GAG GCT GCG CTT GCT GTG CCT GGG GTT TAT GGC AGC CGC ATG ACG 
1060 

Val Glu Ala Ala Leu Ala Val Pro Gly Val Tyr Gly Ser Arg Met Thr 

40 330 335 340 

GGC GGT GGC TTC GGT GGC TGC ACG GTG ACA CTG CTG GAG GCC TCC GCT 
1108 

Gly Gly Gly Phe Gly Gly Cys Thr Val Thr Leu Leu Glu Ala Ser Ala 
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5 345 350 355 360 

GCT CCC CAC GCC ATG CGG CAC ATC CAG GAG CAC TAC GGC GGG ACT GCC 
1156 

Ala Pro His Ala Met Arg His He Gin Glu His Tyr Gly Gly Thr Ala 
10 365 370 375 

ACC TTC TAC CTC TCT CAA GCA GCC GAT GGA GCC AAG GTG CTG TGC TTG 
1204 

Thr Phe Tyr Leu Ser Gin Ala Ala Asp Gly Ala Lys Val Leu Cys Leu 
15 380 385 390 

TGAGGCACCC CCAGGACAGC ACACGGTGAG GGTGCGGGGC CTGCAGGCCA GTCCCACGGC 
1264 

20 TCTGTGCCCG GTGCCATCTT CCATATCCGG GTGCTCAATA AACTTGTGCC TCCAATGTGG 
1324 



25 



30 



AAAAAAAAAA AAAAAAAAAC TCGAG 
1349 



(2) INFORMATION FOR SEQ ID NO : 5 : 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 1349 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : double 
35 (D) TOPOLOGY: linear 

<ii) MOLECULE TYPE: cDNA 



40 (ix) FEATURE : 

(A) NAME/KEY: CDS 

(B) LOCATION: 29.. 1204 
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5 (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 5 : 

GAATTCGGCA CGAGTGCAGG CGCGCGTC ATG GCT GCT TTG AGA CAG CCC CAG 
52 

Met Ala Ala Leu Arg Gin Pro Gin 
10 1 5 

GTC GCG GAG CTG CTG GCC GAG GCC CGG CGA GCC TTC CGG GAG GAG TTC 
100 

Val Ala Glu Leu Leu Ala Glu Ala Arg Arg Ala Phe Arg Glu Glu Phe 

15 10 15 20 

GGG GCC GAG CCC GAG CTG GCC ATG TCA GCG CCG GGC CGC GTC AAC CTC 
148 

Gly Ala Glu Pro Glu Leu Ala Met Ser Ala Pro Gly Arg Val Asn Leu 

20 25 30 35 40 

ATC GGG GAA CAC ACG GAC TAC AAC CAG GGC CTG GTG CTG CCT ATG GCT 
196 

lie Gly Glu His Thr Asp Tyr Asn Gin Gly Leu Val Leu Pro Met Ala 
25 45 50 55 

CTG GAG CTC ATG ACG GTG CTG GTG GGC AGC CCC CGC AAG GAT GGG CTG 
244 

Leu Glu Leu Met Thr Val Leu Val Gly Ser Pro Arg Lys Asp Gly Leu 
30 60 65 70 

GTG TCT CTC CTC ACC ACC TCT GAG GGT GCC GAT GAG CCC CAG CGG CTG 
292 

Val Ser Leu Leu Thr Thr Ser Glu Gly Ala Asp Glu Pro Gin Arg Leu 
35 75 80 85 

CAG TTT CCA CTG CCC ACA GCC CAG CGC TCG CTG GAG CCT GGG ACT CCT 
340 

Gin Phe Pro Leu Pro Thr Ala Gin Arg Ser Leu Glu Pro Gly Thr Pro 

40 90 95 100 

CGG TGG GCC AAC TAT GTC AAG GGA GTG ATT CAG TAC TAC CCA GCT GCC 
388 

Arg Trp Ala Asn Tyr Val Lys Gly Val lie Gin Tyr Tyr Pro Ala Ala 
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5 105 110 115 120 

CCC CTC CCT GGC TTC AGT GCA GTG GTG GTC AGC TCA GTG CCC CTG GGG 
436 

Pro Leu Pro Gly Phe Ser Ala Val Val Val Ser Ser Val Pro Leu Gly 
10 125 130 135 

GGT GGC CTG TCC AGC TCA GCA TCC TTG GAA GTG GCC ACG TAC ACC TTC 
484 

Gly Gly Leu Ser Ser Ser Ala Ser Leu Glu Val Ala Thr Tyr Thr Phe 
15 140 145 150 

CTC CAG CAG CTC TGT CCA GAC TCG GGC ACA ATA GCT GCC CGC GCC CAG 
532 

Leu Gin Gin Leu Cys Pro Asp Ser Gly Thr He Ala Ala Arg Ala Gin 
20 155 160 165 

GTG TGT CAG CAG GCC GAG CAC AGC TTC GCA GGG ATG CCC TGT GGC ATC 
580 

Val Cys Gin Gin Ala Glu His Ser Phe Ala Gly Met Pro Cys Gly He 
25 170 175 180 

ATG GAC CAG TTC ATC TCA CTT ATG GGA CAG AAA GGC CAC GCG CTG CTC 
628 

Met Asp Gin Phe He Ser Leu Met Gly Gin Lys Gly His Ala Leu Leu 

30 185 190 195 200 

ATT GAC TGC AGG TCC TTG GAG ACC AGC CTG GTG CCA CTC TCG GAC CCC 
676 

He Asp Cys Arg Ser Leu Glu Thr Ser Leu Val Pro Leu Ser Asp Pro 

35 205 210 215 

AAG CTG GCC GTG CTC ATC ACC AAC TCT AAT GTC CGC CAC TCC CTG GCC 
724 

Lys Leu Ala Val Leu He Thr Asn Ser Asn Val Arg His Ser Leu Ala 
40 220 225 230 

TCC AGC GAG TAC CCT GTG CGG CGG CGC CAA TGT GAA GAA GTG GCC CGG 
772 

Ser Ser Glu Tyr Pro Val Arg Arg Arg Gin Cys Glu Glu Val Ala Arg 
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5 235 240 245 

GCG CTG GGC AAG GAA AGC CTC CGG GAG GTA CAA CTG GAA GAG CTA GAG 
820 

Ala Leu Gly Lys Glu Ser Leu Arg Glu Val Gin Leu Glu Glu Leu Glu 

10 250 255 260 

GCT GCC AGG GAC CTG GTG AGC AAA GAG GGC TTC CGG CGG GCC CGG CAC 
868 

Ala Ala Arg Asp Leu Val Ser Lys Glu Gly Phe Arg Arg Ala Arg His 

15 265 270 275 280 

GTG GTG GGG GAG ATT CGG CGC ACG GCC CAG GCA GCG GCC GCC CTG AGA 
916 

Val Val Gly Glu He Arg Arg Thr Ala Gin Ala Ala Ala Ala Leu Arg 
20 285 290 295 

CGT GGC GAC TAC AGA GCC TTT GGC CGC CTC ATG GTG GAG AGC CAC CGC 
964 

Arg Gly Asp Tyr Arg Ala Phe Gly Arg Leu Met Val Glu Ser His Arg 

25 300 305 310 

TCA CTC AGA GAC GAC TAT GAG GTG AGC TGC CCA GAG CTG GAC CAG CTG 
1012 

Ser Leu Arg Asp Asp Tyr Glu Val Ser Cys Pro Glu Leu Asp Gin Leu 
30 315 320 325 

GTG GAG GCT GCG CTT GCT GTG CCT GGG GTT TAT GGC AGC CGC ATG ACG 
1060 

Val Glu Ala Ala Leu Ala Val Pro Gly Val Tyr Gly Ser Arg Met Thr 
35 330 335 340 

GGC GGT GGC TTC GGT GGC TGC ACG GTG ACA CTG CTG GAG GCC TCC GCT 
1108 

Gly Gly Gly Phe Gly Gly Cys Thr Val Thr Leu Leu Glu Ala Ser Ala 

40 345 350 355 360 

GCT CCC CAC GCC ATG CGG CAC ATC CAG GAG CAC TAC GGC GGG ACT GCC 
1156 

Ala Pro His Ala Met Arg His He Gin Glu His Tyr Gly Gly Thr Ala 
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5 365 370 375 

ACC TTC TAC CTC TCT CAA GCA GCC GAT GGA GCC AAG GTG CTG TGC TTG 
1204 

Thr Phe Tyr Lea Ser Gin Ala Ala Asp Gly Ala Lys Val Leu Cys Leu 
10 380 385 390 

TGAGGCACCC CCAGGACAGC ACACGGTGAG GGTGCGGGGC CTGCAGGCCA GTCCCACGGC 
1264 

15 TCTGTGCCCG GTGCCATCTT CCATATCCGG GTGCTCAATA AACTTGTGCC TCCAATGTGG 
1324 

AAAAAAAAAA AAAAAAAAAC TCGAG 
1349 



20 



30 



35 



(2) INFORMATION FOR SEQ ID NO : 6 : 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 134 9 base pairs 
25 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: double 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: cDNA 



(ix) FEATURE: 

(A) NAME/KEY: CDS 

(B) LOCATION: 29 . .265 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 6 : 



GAATTCGGCA CGAGTGCAGG CGCGCGTC ATG GCT GCT TTG AG A CAG CCC CAG 
40 52 

Met Ala Ala Leu Arg Gin Pro Gin 
1 5 
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5 GTC GCG GAG CTG CTG GCC GAG GCC CGG CGA GCC TTC CGG GAG GAG TTC 
100 

Val Ala Glu Leu Leu Ala Glu Ala Arg Arg Ala Phe Arg Glu Glu Phe 

10 15 20 

10 GGG GCC GAG CCC GAG CTG GCC GTG TCA GCG CCG GGC CGC GTC AAC CTC 
148 

Gly Ala Glu Pro Glu Leu Ala Val Ser Ala Pro Gly Arg Val Asn Leu 

25 30 35 40 

15 ATC GGG GAA CAC ACG GAC TAC AAC CAG GGC CTG GTG CTG CCT ATG GCT 
196 

lie Gly Glu His Thr Asp Tyr Asn Gin Gly Leu Val Leu Pro Met Ala 
45 50 55 

20 CTG GAG CTC ATG ACG GTG CTG GTG GGC AGC CCC CGC AAG GAT GGG CTG 
244 

Leu Glu Leu Met Thr Val Leu Val Gly Ser Pro Arg Lys Asp Gly Leu 

60 65 70 

25 GTG TCT CTC CTC ACC ACC TCT TAGGGTGCCG ATGAGCCCCA GCGGCTGCAG 
295 

Val Ser Leu Leu Thr Thr Ser 
75 

30 TTTCCACTGC CCACAGCCCA GCGCTCGCTG GAGCCTGGGA CTCCTCGGTG GGCCAACTAT 
355 

GTCAAGGGAG TGATTCAGTA CTACCCAGCT GCCCCCCTCC CTGGCTTCAG TGCAGTGGTG 
415 



35 



GTCAGCTCAG TGCCCCTGGG GGGTGGCCTG TCCAGCTCAG CATCCTTGGA AGTGGCCACG 
475 



TACACCTTCC TCCAGCAGCT CTGTCCAGAC TCGGGCACAA TAGCTGCCCG CGCCCAGGTG 
40 535 

TGTCAGCAGG CCGAGCACAG CTTCGCAGGG ATGCCCTGTG GCATCATGGA CCAGTTCATC 
595 
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5 TCACTTATGG GACAGAAAGG CCACGCGCTG CTCATTGACT GCAGGTCCTT GGAGACCAGC 
655 



10 



CTGGTGCCAC TCTCGGACCC CAAGCTGGCC GTGCTCATCA CCAACTCTAA TGTCCGCCAC 

TCCCTGGCCT CCAGCGAGTA CCCTGTGCGG CGGCGCCAAT GTGAAGAAGT GGCCCGGGCG 

CTGGGCAAGG AAAGCCTCCG GGAGGTACAA CTGGAAGAGC TAGAGGCTGC CAGGGACCTG 
I J 835 

GTGAGCAAAG AGGGCTTCCG GCGGGCCCGG CACGTGGTGG GGGAGATTCG GCGCACGGCC 
8 95 

20 CAGGCAGCGG CCGCCCTGAG ACGTGGCGAC TACAGAGCCT TTGGCCGCCT CATGGTGGAG 
AGCCACCGCT CACTCAGAGA CGACTATGAG GTGAGCTGCC CAGAGCTGGA CCAGCTGGTG 



25 



GAGGCTGCGC TTGCTGTGCC TGGGGTTTAT GGCAGCCGCA TGACGGGCGG TGGCTTCGGT 
30 ^3 5 TGCACGG TGACACTG C T GGAGGCCTCC GCTGCTCCCC ACGCCATGCG GCACATCCAG 
GAGCACTACG GCGGGACTGC CACCTTCTAC CTCTCTCAAG CAGCCGATGG AGCCAAGGTG 
35 CTGTGCTTGT GAGGCACCCC CAGGACAGCA CACGGTGAGG GTGCGGGGCC TGCAGGCCAG 
TCCCACGGCT CTGTGCCCGG TGCCATCTTC CATATCCGGG TGCTCAATAA ACTTGTGCCT 



40 



CCAATGTGGA AAAAAAAAAA AAAAAAAACT CGAG 
1349 
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5 (2) INFORMATION FOR SEQ ID NO : 7 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 7676 base pairs 

(B) TYPE: nucleic acid 
10 (C) STRANDEDNESS : double 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



15 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 7 : 

20 CCGAGCATCC CGCGCCGACG GGTCTGTGCC GGAGCAGCTG TGCAGAGCTG CAGGCGCGCG 
60 

TCATGGCTGC TTTGAGACAG CCCCAGGTCG CGGAGCTGCT GGCCGAGGCC CGGCGAGCCT 



25 



40 



120 



TCCGGGAGGA GTTCGGGGCC GAGCCCGAGC TGGCCGTGTC AGCGCCGGGC CGCGTCAACC 



180 



TCATCGGGGA ACACACGGAC TACAACCAGG GCCTGGTGCT GCCTATGGTG AGGGGCTGCA 
30 240 

CGGGGAGCCC CTAGCCCGCC GCCGCCTGTC CCGGTCGCCG AGGAGGGCGG GCCTCGGGGA 
300 

35 CGCTGGGGGC GAGTTCTTCC CGCGGGAGAT GTGGGGCGGG CAGCTGCGCC TGGAGCACCG 
360 



GTGCACGGAA GAGTCCCCGG GACAGGCTGT TCCCCACGTT GGAAGGGAGG AAGCGAAGAA 



420 



GTGGTCCCCA GAGGGTGCGC GGCCGCCTCT TGGCTCAAGC CCGCCCTCTG GGGGCTGGGG 
480 



38 



BN80000: **O__0e0W4A1JL> 



WO 96/09374 




PCT/US95/06743 



5 CTCCTCGCCT TCAACCTGGG AGCATGTTCC CCTTAAACTG TGAGGCCCTG TGTGCCACGC 
540 



AGAAGGGGAC ACTCCGCGCC TCCGGCCACC GTGGGGCCCC AACCGCAGAC CTGGGCGAAC 
600 

10 

GTAGCCTTCT GGCCCAGCCC GTTCAATTTA CAGAGGAGGA AACTGAGGCC TAGAGAGGCC 
660 

CAGTGAACTG CTGGAGGTCA CACAGCAGGT TCTTGGCGGG GCTGCGACTT GGGAGTGAGG 
15 720 



ACTCCCAGCT TTCAGCGGGG GGCGCTTTCC GCCCCATCTG CAGCTTGGGG AGTGCACAGG 
780 



TACAGGATGT CCAGAGCCAC CCAAAATGTA AAGGCTTTGG AGCTCCAGTG ATCTGTTTTC 
840 



CCTTTGGGCT AAGCTCTCCC CCCTTGCCCC ACAGCTCAGG GCAGAGTCCA GGTCTGTGCT 
900 

25 

CCAGCTGCAG CCGCCCCGCC CCTGAAGACC TAAGGGGGCA GGGCTCAAGC CCCCAAGGTC 
960 

AGCTGGCCCT CAGGATCTTC CCTGCGACGC TGAACCTGGA GGTTCAGAAC CTGATGACTG 

30 1020 



TGGAGGCATC AGAACCTCGG CTGGAGGCAG TGTCATTGGA GAGGCTTACT CCAGCTGGCG 
1080 



35 GAAGCCTCAC GTACTGCTTG TCTCTCCTGC CAGGCTCTGG AGCTCATGAC GGTGCTGGTG 
1140 

GGCAGCCCCC GCAAGGATGG GCTGGTGTCT CTCCTCACCA CCTCTGAGGG TGCCGATGAG 
1200 

40 

CCCCAGCGGC TGCAGTTTCC ACTGCCCACA GCCCAGCGCT CGCTGGAGCC TGGGACTCCT 
1260 
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5 CGGTGGGCCA ACTATGTCAA GGGAGTGATT CAGTACTACC CAGGTATGGG GCCCAGGCCT 
1320 

GAGCCAAGTC CTCACTGATA CTAGGAGTGC CACCTCACAG CCACAGAGCC CATTCATTTG 
1380 

10 

TCTGATACAC TGTGGGGAAG GCTTGTAGAG TGGAGCATCC CATTGTACAG ATGAGGAAAC 
1440 

TGATGCCCCC AGAAGGTCGG GAACTTGCCC TGGGTTTCCC GTGACCTGAT TGGAGGAGCC 
15 1500 

AGGATTTGAA CCCCAGCCTT TTTTCCCTCC AGAGCCCTAA ACCAGGAGGA CAATTAGAAG 
1560 

20 TGTCCCAGCA ACCTCAGAGG GTGGGAAAAT GGAGGGGAGT GGGTCCCTTG GGCCAGCAGG 
1620 

TTGGTGGGGT TCTTGACAAT TGAGACACAC ACCTAGAAAC AGTTGCTAGG CCGTTGCTGC 
1680 

25 

CCTTCCCGCC AGG AC AC C T G CCCTTCCTGT CCAATCCTCC CAGGCAGCCT CTCTTACCAT 
1740 

CACCTGTTCT TTCCCCCTGC AGCTGCCCCC CTCCCTGGCT TCAGTGCAGT GGTGGTCAGC 

30 1800 

TCAGTGCCCC TGGGGGGTGG CCTGTCCAGC TCAGCATCCT TGGAAGTGGC CACGTACACC 
1860 

35 TTCCTCCAGC AGCTCTGTCC AGGTACCAGC TAGGCCCCAG CCCTGACCCA GCCCTCCTTC 
1920 

CCTGAGGTCT CCAGGTGGTC CCAGCTTCTA CTATGCCTTA TGGAGGGGGT GGCAGGGAAT 
1980 

40 

CTCCCTGGAG TGTCATTGAA GCCACTGCTG CTTCCACCAG CCCTAGCCTC CCCACCTCAC 
2040 



40 
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5 CCTGTACTGC AGACTCGGGC ACAATAGCTG CCCGCGCCCA GGTGTGTCAG CAGGCCGAGC 
2100 

ACAGCTTCGC AGGGATGCCC TGTGGCATCA TGGACCAGTT CATCTCACTT ATGGGACAGA 
2160 

10 

AAGGCCACGC GCTGCTCATT GACTGCAGGT TGGGCTCGCT CCCCTCGTCC CCTCCCGCCC 
2220 



TGCACTCAGC AGCTCCTGGG TGGAGTGTGC CCACTGCCTG GCGCAGCAAG CACACGCTTG 
15 2280 



GCCTCGTCAT CTCCCCCATT GTAACTCCAC CCCAGGTCCT TGGAGACCAG CCTGGTGCCA 
2340 

20 CTCTCGGACC CCAAGCTGGC CGTGCTCATC ACCAACTCTA ATGTCCGCCA CTCCCTGGCC 
2400 



TCCAGCGAGT ACCCTGTGCG GCGGCGCCAA TGTGAAGAAG TGGCCCGGGC GCTGGGCAAG 
2460 

25 

GAAAGCCTCC GGGAGGTACA ACTGGAAGAG CTAGAGGGTG AGAACTGCCA GGGTGCTCTA 
2520 



TCCTGGAGGC GGCTGTGCTC CCTGCTGGCG CCTCAGTGTG GCCTTGACCC TGCCTGGGAC 
30 2580 



CCCGATCTCC AGGGGC TTCT GCCATGCTCT CCCCAGTCCC TTCAAACACT GCGCACCCAG 
2640 

35 GGTTCCAATC TCAGCAGGGG TGCTTGAAAT CCTAAAATGG TCTTATCTAA TCAGAAAAAT 
2700 

CATGTTTCCA TTGTGGAAAA TGTAGAAAAG TACAAAGTAG AAAATAATAA GCTATAAGGG 
2760 

40 

CACTACCCAG AGATAGGCAC TGCTGACATT TTCACGTTTC CTTTCAGTAT TTTTCCACAT 
2820 
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5 CTGTCTTCAA AGCTGAGTAT ATGTAATATA TCATCACTTT CCCCCCCCAC CCCCTTTTTT 
2B80 

TTAAGAGGCA GGGTCTCATT CTGTTGCCCA AGCTGGAGTG TAGTGGTGTG ATCATAGCTT 
2940 

10 

ACTGCAAACT TGAACTCTTG AGCTCAAGGG ATCCTCCCAG CTCAGCCTTC CAAGTAGCTG 
3000 

AGATTACAGG TGTGCCACCA TGCCCGGCTA ATTTTTATCT TCGTAAAGAC GGCCTTGTAG 
15 3060 

TGTTGCCCAG GATGATCCTG AACTCTGGCC TCAAGAGGTC CTCCTGCCTT GGGCTCCCAA 
3120 

20 AGTGTTGGGA TTATAGGCAT GAGCCACTGC GGCCAGCCCA TTTGCCGTGT TTTTTTTTTG 
3180 

GACACAGAGT TTCGGTCTTG TCACCCATGC TGGAGTGCAA TGGTGCGATC TCAGCTCACT 
3240 

25 

GTAACCTCTG CCTCCCGGGT TCAAGTGATT CTCCTGCCTC AGCCTCCCGA GTAGCTGGGA 
3300 

CTACAGGCGC CCGCCACTAC GCCTGGCACA TTTTTTATAG TTCTAGTAGA GACTGGGGTT 
30 3360 

TCACCATGTT GGCCAGGCTG GTCTCAAACG CCTGACCTCA GGTGATCCTC CCGCCTCAGC 
3420 

35 CTTCCAAAGT GCTGGGATTA CAGGCGTGAG CCATAGTGCC GGTCTCTTTT ttTTTTTTTT 
3480 

TTAAACTAAA CATAATCTCA GAACCCAGAA CCCTATCTTA TCTTATGCCA TGAAAGGCAT 
3540 

40 

ATCTCGGCGT GGCTCTTTTT TTTTTTTTTT CTTTTTTTTT GGGCGAGGTG GAGGCTTGCC 
3600 
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5 CTGTTGCCCA GGCTGGAGTG CAGCGGCGCA ATCTCGGTTC ACTGCATCCT CCACCTCCTG 
3660 

GGTCCAAATG ATCCTCCTGC CTTAGCTTCC TGAGTAGGTG GGATTACTGG AACCCACCAC 
3720 

10 

CACGCCCAGC CAATTTTTAT ATTTTTAGTA GAGACGGGGT TTCATGTTGG CCAGGCTGGC 
3780 

CTCGAACTCC TGACCTCGTG ATCTGCCCGC CTCAGCCTCC CAATGTGCTA GGATTACATG 
15 3840 

TGTGAGCCAC TGCACCTGGC CTCCGTGTGG CTCTTTAAAG CTCCACAATA TTTTAGCATT 
3900 

20 CAGGTGCTCT GTCATTTACT TAACTATTTT CTGATACACC TCACACTGCG ATTAACTTTC 
3960 

CTTATTTATC TTTTTTATTA TTTATTTATT TATTTATTTG AGACAGAGTC TTGCTCTGTC 
4020 

25 

ACCCAGGCTG GAGTGCAGTG GCACGATCTC GGCTCACTGC AACCTCTGCC TCCCAGGTTC 
4080 

AAGTGATTCT CCTGCCTCAG CCTCCTGAGT AGCTAGGATT AGAGGCATGT GCCACCACAC 
30 4140 

CTGGCTAATC TTCGTATTTT TAGCAGAGAT GAGGTTTTAC CATGTTGGTC GGGCTGGTCG 
4200 

35 TGAACTCCTG ACCTGGTGAT CTGCCCACCT CAGCCTCCCA AAGTACTGGG ATGACAGGCA 
4260 

TGAACCACTG TGCCTGGCCA TCTTTTTTAT TTTTTAAAGA GATGGGTTCT GCTAAGTTGC 
4320 

40 

CCAGGCTGGA CCTGAACTCT TGGGCTCAAG TAATCTTCTC ACCTAGTCTC CTGGGTAGCT 
4380 



43 



DNOOOCap: <WO__*eOM74A1JL> 



WO 96/09374 




PCT7US95/06743 



5 GCAACCAAAG GCACCCGGTT TATCTGCATT CTCTTTTTTT TCTTTGAGAC TGAGTCTTGC 
4440 

TCTGTAGCCC AGGCTGGAGC GCAGTGGCGT GATCTCGGCT CACTGCAACC TCCGTCTTCA 
4500 

10 

GGGTTCAAGC AATTCTCCTG CCTCAGCCTC TGGAGTGGCT GGGACTACAG GCGTGTGCCA 
4560 

CCAGAGCGAG TTAATTTTTT TTTTTTTTTG TATTTTTAGT GGACACTGGG TTTCACTATA 
15 4620 

TTGGCCAGGC TGGTCTTGGA CTCCTGACCT CAAGTGATCC GCCTGCCTTG GCCTCCCAAA 
4680 

20 GTGCTGGGAT TACAGGCACA GGCGTGAGCC ACTACACCTG GCCTATCTGC ATTCTCTTAA 
4740 

TAGTTTCTTA GAAATGGATT CTTAGGAGTA GGATTACAGA GTCAAGAGAC ACAAGTTTTG 
4800 

25 

TAGGCTGGGT GCGGTGGCTC ACGTCTGTGC CTGTAATCCC AGTACTTTAG GAGGCCAAGG 
4860 

TGGGCAGATT CATTGAGCTC AGGAATTCGA GACCAGCCTG GGC AACATGG CAAAACCCCA 
30 4920 

TCTCTAAAGA AATACAAAAA TTAGCCAGGT GTGGTGGTGT GTGCCTGTAG TCCTAGCTAC 
4980 

35 TTAGGAGGCT GGGGTGGGAG GATCAATTGA GCCCAGGAGG TTGAGACTGC AGTGAGCTGT 
5040 

GATTGCACCA TGGCACTCCA GCCTGGGCCT CAAAGTGAGA TCCTGTCTCC AAAACAAAAA 
5100 

40 

AG AT AC AAG T ATCCTTAAGG CTCCTGCTAC ACATGGCCAG GAAGGTAGTC TATTGGACAG 
5160 
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5 TTTTAAGGTC ATTATCAATA TTAGCTCATT TAATTCCCTC CAAAACTCTG TAAAGCACAT 
5220 

TCTGCTACCA TAGTTGTCAT ATTTTTGATG GGGGAATCTA CAGTGAGAGG CAGTGCTGGG 
5260 

10 

ATCTGAACCC CATCTGGACA GATTAGCTCC AGGGCCCATG CTCTTGACTG GCTGGCCGCG 
5340 

CTGCCCACAC TGAGTTGTTC CTTCCTGGCA GGGTAGGTGT GCCTATCTCA GGGACACTAG 
15 5400 

ACAGCTCCGA GGGACCTCCC TGTCCTTTTC CTTTGTGAAC TGTGTCACGT TCTCCAGAGC 
5460 

20 AGGGCTCAGA CCTGCCCTGC CTGCTCTGTG CAGATGCCCT TGGCCAAGGT TTTCACACTG 
5520 

GAACAAGTTG GTCCCTCCTC CCCACCCCAG CCTGTCCTTG GCCCTCCTCC AGGTCTCCTT 
5580 

25 

CTGCATAGGA GCAGCTCACC CTGCCTCCTC CAGAGTCCTG CCCTAGAAGC GCAATCCCTC 
5640 

TCCTTCCATC CCCTGCCTGG CTGCCTGGCT CCTTCCCTCA GCCTCCAAGA CATGCTCAGT 
30 5700 

TTTCTTCCCT CCTAAAACAC CACCCACTGT CTCATTTCCA TTCATTTCTT TCTTTCTTTC 
5760 

35 TTTCTTTTTT TTTTTTGAGA GGGAGCCTCA CTCTGTCACC CAGGCTGAAG TGCAGTGGCA 
5820 

TGATCTCCAC TCACTGCAAC CTCCGCCTCC CAGGTTCAAG CAATTCTCCT GCCTCAGCCT 
5880 

40 

CCTGAGTAGC TGGGATTACA GGCGCCTGCC ACGATGCCCG GCTAACTTTT GTATTTTTAG 
5940 
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5 TAGAGACGGG GTTTCGCCAT GTTGGCCAGG 
6000 

TGCCTGCCTC AGCTTCCCAA AGTGCTGGGA 
6060 

10 

TCATTTCTCA GTCCTTTGAA TCTACTTGCC 
6120 

AACCTTCCCC CTTAAACCTG CGGGTTTGGC 
15 6180 

CTGACCCAGG TACCCCTCCA GCCTCAGCTC 
6240 

20 GCTGCTTCTG CCCCCTCTTC TGGAGCCCCA 
6300 

CTTCTTCTCC TCCTGCTCTG TGGTGGCCTC 
6360 

25 

GAGTGTTTCA ACCCTCACTG CTCCCTGAAG 
6420 

AGGAGGCACT GTGATAAAGG GGCTCTTCAG 
30 6480 

CCCCCGCGGC CTTCCACCCT TCACCGTCCA 
6540 

35 TCCTCACAGG CGTCGGGGCC CCAGGCAGTG 
6600 

CCAGCTGCCA GGGACCTGGT GAGCAAAGAG 
6660 

40 

GAGATTCGGC GCACGGCCCA GGCAGCGGCC 
6720 




PCT/US95/06743 



CTGGTCTCGA GCTCCTGACC TCAGGCAATC 



TTACAGGTGT GAGCCACCGC GCCCACCCAT 



CCTCCATCCC GCCATGCCAC CTACCCTAAC 



CGGGCGCAGT ACACTGAGTC AGTACTGGTA 



CAGTCAGATG GGACAGCCTG CTGGTCCCTG 



GCCCTGGAGG CTCCATGTGG CTCAGCAGAA 



TTGAGGGCAG CACTCACCTT GGAAAGCATG 



GACCAAGGTG TCCCATTTTA CAGTCGGGGG 



ACCCACGTCT GAGAGAGCCA GGCTGCGCCG 



GCCAGGGCCA CTGCCATCAC CGCCTGCTGG 



AGAAGGCGGC TGCTGACTCC TCTTTCCTCC 



GGCTTCCGGC GGGCCCGGCA CGTGGTGGGG 



GCCCTGAGAC GTGGCGACTA CAGAGCCTTT 
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5 GGCCGCCTCA TGGTGGAGAG CCACCGCTCA CTCAGGTGAG GCCCTCTGGG CGCCCCGCTC 
6780 

CTGCCGGGCA CAGGCCGGCC CAGGCCCACC CCTTCAATAT CCTCTCTGCA GAGACGACTA 
6840 

10 

TGAGGTGAGC TGCCCAGAGC TGGACCAGCT GGTGGAGGCT GCGCTTGCTG TGCCTGGGGT 
6900 

TTATGGCAGC CGCATGACGG GCGGTGGCTT CGGTGGCTGC ACGGTGACAC TGCTGGAGGC 
15 6960 



CTCCGCTGCT CCCCACGCCA TGCGGCACAT CCAGGTGGGC GGGCACCAGG GCCTGGGCGG 
7020 



GCAGGAGCGG CAGCTTCCCG GGGCCCTGCC ACTCACCCCC AGCCCGCCTC TTACAGGAGC 
7080 



ACTACGGCGG GACTGCCACC TTCTACCTCT CTCAAGCAGC CGATGGAGCC AAGGTGCTGT 
7140 

25 

GCTTGTGAGG CACCCCCAGG ACAGCACACG GTGAGGGTGC GGGGCCTGCA GGCCAGTCCC 
7200 

ACGGCTCTGT GCCCGGTGCC ATCTTCCATA TCCGGGTGCT CAATAAACTT GTGCCTCCAA 
30 7260 



TGTGGTACCT GCCTCCTCTA GAGGTGGGTG TATGCTTGGG TGTCAGAGAA TGGGGGATGT 
7320 

35 CAGAACCGCT CCCCTACCCT AGGGGAGCAC CTCTCAGGCC CCAGAAGAAT GGGCAAGGCA 
7380 

GGGCCTAGCA GTAGCAAAAC CATTTATTAA GTGCAGAACA AAGGCTGGGT CCTTGTGCTG 
7440 

40 

CTCCCAGCTC TTTGGTTACA AATAGGTTTG GGCCCACAGA GGACGGACCT TGCCCCCTTC 
7500 
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5 ATGCCTCCCA GGAGACACCT AGCCCCTGCT CTGTGCATGC GGGTGGGCTG GGCCCCCAGG 
7560 

GGTGCAAGGA TGGAGTAGCT GAGGAGGCTC CGGGAGAGGA GTCGGGAGGA CGCCTAGTGG 
7620 



10 



15 



GACATTGCGG GGGTGGCGCA GGGTGCGGTC AAGTTTGGAA GAAACTGTTG GGTCCA 
7676 



(2) INFORMATION FOR SEQ ID NO : 8 : 



(i) SEQUENCE CHARACTERISTICS: 
CA) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 
20 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



25 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 8 : 
AGCCTTCCGG GAGGAGTTCG G 

30 21 

(2) INFORMATION FOR SEQ ID NO : 9 : 

<1) SEQUENCE CHARACTERISTICS: 
35 (A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

40 (ii) MOLECULE TYPE: DNA (genomic) 
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5 <xi) SEQUENCE DESCRIPTION: SEQ ID NO: 

CTGGTTGTAG TCCGTGTGTT C 
21 

10 (2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 
15 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

20 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l 

25 GCCAGCAGCT CCGCGACCTG G 
21 

(2) INFORMATION FOR SEQ ID NO : 1 1 : 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11 

GCTTCCTCCC TTCCAACGTG G 
21 



30 



35 
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5 

(2) INFORMATION FOR SEQ ID NO: 12: 

(i) SEQUENCE CHARACTERISTICS : 
(A) LENGTH: 21 base pairs 
10 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



15 



20 



25 



35 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12 

CCCAGGCTCC AGCGAGCGCT G 
21 

(2) INFORMATION FOR SEQ ID NO: 13: 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 
<C) STRANDEDNESS : single 

30 (D ) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 3 : 



ACCTCTGAGG GTGCCGATGA G 

40 21 

(2) INFORMATION FOR SEQ ID NO : 1 4 : 
(i) SEQUENCE CHARACTERISTICS 
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5 (A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
<D) TOPOLOGY: linear 

10 (ii) MOLECULE TYPE: DNA (genomic) 



15 (xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 

CCCACAGCTC AGGGCAGAGT C 
21 

20 (2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acici 
25 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

30 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 

35 GGACACTTCT AATTGTCCTC C 
21 

(2) INFORMATION FOR SEQ ID NO : 1 6 : 

40 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA (genomic) 



10 



15 



25 



30 



35 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 6 ; 

GATGAACTGG TCCATGATGC C 
21 

(2) INFORMATION FOR SEQ ID NO: 17: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
20 (B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17 

AGGGGCACTG AGCTGACCAC C 
21 

(2) INFORMATION FOR SEQ ID NO: 18: 



(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
40 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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<x±) SEQUENCE DESCRIPTION: SEQ ID NO 

CACTTCTACA CATTGGCGCC G 
10 21 

(2) INFORMATION FOR SEQ ID NO: 19: 



15 



20 



40 



(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

fii) MOLECULE TYPE: DNA (genomic) 



25 (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 1 

CTTCGCAGGG ATGCCCTGTG G 
21 

30 (2) INFORMATION FOR SEQ ID NO: 20: 

U) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 
35 (C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION : SEQ ID NO:20: 
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5 TCATCACCAA CTCTAATGTC C 
21 

(2) INFORMATION FOR SEQ ID NO : 2 1 : 

10 (i) SEQUENCE CHARACTERISTICS: 

{A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

15 

(ii) MOLECULE TYPE: DNA {genomic) 



20 



25 



35 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 1 

TGTCAGCAGT GCCTATCTCT G 
21 

(2) INFORMATION FOR SEQ ID NO: 22: 



(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
30 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



[ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22 

40 

AGCAGCGGAG GCCTCCAGCA G 
21 

(2) INFORMATION FOR SEQ ID NO:23: 
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5 

<i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
<B) TYPE: nucleic acid 
(C) STRANDEDNESS : single 
10 (D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

15 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
CCTCACCGTG TGCTGTCCTG G 

20 21 

(2) INFORMATION FOR SEQ ID NO : 2 4 : 

(i) SEQUENCE CHARACTERISTICS: 
25 (A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

30 (ii) MOLECULE TYPE: DNA (genomic) 



35 (xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 4 : 

GGCTGCGCTT GCTGTGCCTG G 
21 

40 (2) INFORMATION FOR SEQ ID NO: 25: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 
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5 (C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

10 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25 

15 CCTCACCGTG TGCTGTCCTG G 
21 

(2) INFORMATION FOR SEQ ID NO: 26: 

20 (i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 



25 



(ii) MOLECULE TYPE: DNA (genomic) 



30 

{xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 6 : 

CCTCACCGTG TGCTGTCCTG G 
21 

35 

(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 21 base pairs 
40 (B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO 

10 

GCGGGACTGC CACCTTCTAC C 
21 

(2) INFORMATION FOR SEQ ID NO : 2 8 : 

(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

25 

(xi) SEQUENCE DESCRIPTION : SEQ ID NO : ; 
CTCAATAAAC TTGTGCCTCC A 

30 21 

(2) INFORMATION FOR SEQ ID NO : 2 9 : 

(i) SEQUENCE CHARACTERISTICS: 
35 (A) LENGTH: 23 base pairs 

(B) TYPE: nucleic acid 
£C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

*0 <ii) MOLECULE TYPE: DNA (genomic) 



15 



20 



57 



BNBDOQD: <WO 900*374*1 JL> 



WO 96/09374 




PCT/US95/06743 



5 (Xi) SEQUENCE DESCRIPTION: SEQ ID NO : 2 9 : 

CGGATATGGA AGATGGCACC GGG 
23 

10 (2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 
15 <C> STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 

20 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 

25 AGAGCTGCAG GCGCGCGTCA TG 
22 

(2) INFORMATION FOR SEQ ID NO: 31: 

30 (i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 



35 



40 



(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO : 3 1 

CCGAGCATCC CGCGCCGAC 
19 



58 



D W 0 OOCC: <WO__06O«374A1JL> 



WO 96/09374 




PCT/US95/06743 



5 

(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 
(A) LENGTH: 20 base pairs 
3 (B) TYPE: nucleic acid 

<C) STRANDEDNESS: single 
(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA (genomic) 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 

CAGCTGCCCG CCCCACATCT 
20 
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WHAT IS CLAIMED IS: 



1. An isolated nucleic acid molecule encoding human genomic galactokinase, 
said nucleic acid molecule selected from the group consisting of: 
10 (a) a nucleic acid molecule comprising the sequence as set forth in SEQ ID 

NO:7; and 

(b) a nucleic acid molecule differing from the nucleic acid molecule of (a) in 
codon sequence due to the degeneracy of the genetic code. 

15 2. A vector comprising the nucleic acid molecule of claim 1. 

3. A recombinant host cell comprising the vector of claim 2. 

4. An isolated nucleic acid molecule comprising a DNA sequence that encodes 
20 nucleotides 29 to 1204 of SEQ ID NO:5 or nucleotides 29 to 265 of SEQ ED NO:6. 

5. A vector comprising the nucleic acid molecule of claim 4. 

6. The vector according to claim 5 which is a plasmid. 

25 

7. A recombinant host cell comprising the vector of claim 5. 



8. A process for preparing a human galactokinase protein comprising 
culturing the recombinant host cell of claim 7 under conditions promoting expression 

30 of said protein and recovery thereof. 

9. An isolated protein encoded by the DNA sequence of claim 4. 

10. A monoclonal antibody that is specifically reactive with the protein of 

35 claim 9. 

11. A method for diagnosing conditions associated with human galactokinase 
deficiency which comprises isolating a serum or tissue sample from an individual; 
allowing such sample to come in contact with an antibody or antibody fragment 
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5 which specifically binds to the human galactokinase protein of claim 9 under 

conditions such that an antigen-antibody complex is formed between said antibody 
or antibody fragment and said galactokinase protein; and detecting the presence or 
absence of said complex. 



10 12. A method for diagnosing conditions associated with human galactokinase 

deficiency which comprises isolating a nucleic acid sample from an individual; assayirfg 
said sample and the DNA sequence, or corresponding RNA sequence, that encodes a 
human galactokinase gene; and comparing differences between said sample and said 
DNA (or RNA) that encodes nucleotides 29 to 1204 of SEQ ID NO:4, wherein said 

15 differences indicate mutations in the human galactokinase gene. 

13. The method of claim 12 wherein said sample is RNA which is 
subsequently amplified by PCR-RT 

20 14. The method of claim 13 wherein assaying said sample comprises a 

restriction endonuclease digestion. 

15. The method of claim 14 wherein said restriction endonuclease is Msc I. 

25 16. The method of claim 12 wherein assaying said sample comprises a 

hybridization assay. 



17. The method of claim 16 wherein the hybridization assay is heteroduplex 
electrophoresis which comprises determining differential mobility of heteroduplex 
30 products in polyacrylamide gels, said heteroduplex products are the result of 
hybridization between the nucleic acid sample and the DNA sequence, or 
corresponding RNA sequence, that encodes nucleotides 29 to 1204 of SEQ ID NO:4. 



18. The method of claim 12 wherein assaying said sample comprises gel 
35 electrophoresis of restriction fragment length polymorphisms of said nucleic acid 
sample and the DNA sequence, or corresponding RNA sequence, that encodes 
nucleotides 29 to 1204 of SEQ ID NO:4. 
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15 



5 19. The method of claim 12 wherein assaying said sample comprises DNA 

sequencing. 

20. A method for diagnosing conditions associated with human galactokinase 
deficiency which comprises isolating cells from an individual containing genomic DNA 
10 and assaying said sample by in situ hybridization using the DNA sequence that 

encodes nucleotides 29 to 1204 of SEQ ID NO:4, nucleotides 29 to 1204 of SEQ ID 
NO:5, or nucleotides 29 to 265 of SEQ ID NO:6; or a fragment that encodes at least 
one exon of said sequence; or a fragment containing at least 15 contiguous base pairs 
of said sequence as a probe. 



21. A transgenic non-human mammal capable of expresing in any cell thereof 
the DNA of claim 4. 
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FIGURE 2(a) 

5 

CCCAGCATCCCGCGCCCACGGGTCTGTGCCGGAGCAGCTGTGCAGAGCTGCACGCGCGCG 
TCATGGCTGCTTTGAGACACCCCCAGGTCGCGGAGCTGCTGGCCGAGCCCCGGCGAGCC" 
MAALRQPQVAELLAEARRA 

TCCGCGAGGAGTTCGGGGCCGAGCCCGAGCTGGCCGTGTCAGCGCCGGGCCGCGTCAACC 1 1 S 
FREE f , GAEPELAVSAPGRVN 

TCATCGGGGAACACACGGACTACAACCAGGGCCTGGTGCTGCCTATGCTCAGGGGCTGCA 1 7 8 
L I GEHTDYNQGLVLPM 

^S^^^^^rc^CCGCCGCCTGTCCCGGTCGCCCAGGACGXCGGGCCTCGGGGA 23 8 

CC.TGGGGGCGAGTTCTTCCCGCGGGAGATGTGGGGCGGGCAGCTGCGCCTGGAGCAC-G 296 

1 GC ^ GG ^ GAGTCCCCGGGACAGGCT -^TCCCCACGTTGGAAGGGAGGAAGCGAAGAA 3 S8 

GTGGTCCCCAGAGGGTGCGCGGCCGCCTCTTGGCTCAAGCCCGCCCTCTGGGGGCTGGGG 4 1 8 

CTCCTCGCCTTCAACCTGGGAGCATGTTCCCCTTAAACTGTGAGCCCCTGTGTGCCACGC 4 78 

AGAAGGGGACACTCCGCGCCTCCGGCCACCGTGGGGCCCCAACCGCAGACCTGGGCGAAC 5 3 8 

GTAGCCTTCTGGCCCAGCCCGTTCAATTTACAG AGGAGG AAACTG AGGCCTAG AGAGGCC 5 9 8 

CAGTGAACTGCTGGAGGTCACACAGCAGGTTCTTGGCGGGGCTGCGACTTGGGAGTGAGG 658 

ACTCCCAGCTTTCAGCGGGGGGCGCTTTCCGCCCCATCTGCAGCTTGGGGAGTGCACAGG 7 1 8 

TACAGGATGTCCAGAGCCACCCAAAATGTAAAGGCTTTGGAGCTCCAGTGATCTGTTTTC 7 7 8 

CCTTTGGGCTAAGCTCTCCCCCCTTGCCCCACAGCTCAGGGCAGACTCCAGGTCTGTGC^ 8 3 8 

CCAGCTGCAGCCGCCCCGCCCCTGAAGACCTAAGGGGGCAGGGCTCAAGCCCCCAAGGTC 8 9 8 

AGCTGGCCCTCAGGATCrrcCCTGCGACGCTGAACCTGGAGGTTCAGAACCTGATGACTG 958 
TGGAGGCATCAGAACCTCGGCTGGAGGCAGTGTCATTGGAGAGGCTTACTCCAGCTGGCG 1018 
G A AGCCTC ACGTACTGClTGTCTrTCCTGCCAGGCTCTGGAGCTCATGACGGTGCTGGTG 10 7 8 

ALELMTVLV 

GGCAGCCCCCGCAAGGATGGGCTGGTGTCTCTCCTCACCACCTCTGAGGGTGCCGATGAG 113 6 
GSPRKDGLVSLLTTSEGADE 

CCCCAGCGGCTGCAGTTTCCACTGCCCACAGCCCAGCGCTCGCTGGACCCTGGGACTCCT 119 8 
PQRLOFPLPTAQRSLEPGTP 

CGGTGGGCCAACTATGTCAAGGGAGTGATTCAGTACTACCCAGGTATGGGGCCCAGGCCT 12 5 8 
RWANYVKGVIQYYP 
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GAGCCAAGTCCTCACTG ATACTAGGAGTGCCACCTCACAGCCACAGAGCCCATTCATTTG 1318 

TCTGATACACTGTGGGG AAGGCTTGTAGAGTGGAGCATCCCATTGTACAGATGAGGAAAC 137 8 

TGATGCCCCCAGAAGGTCGGGAACTTGCCCTGGGTTTCCCGTGACCTGAT7GGAGGAGCC 14 38 

AGG ATTTGAACCCCAGCCTTTTTTCCCTCCAGAGCCCTAAACCAGG AGG ACAATTAG AAG 14 8 9 

TGTCCCAGC AACCTXT AG AGGGTGGGAAAATGGAGGGGAGTGGGTCCCTTGGGCC A GC AGG 15 58 

TTGGTGGGGTTCTTG ACAATTGAGACACACACCTAGAAACAGTTGCTAGGCCGTTGCTGr 16 18 

CCTTCCCGCCAGGACACCTGCCCTTCCTGTCCAATCCTCCCAGGCAGC-CTCTC7TACCA- 16 7 8 

CACCTGTTCTTTCCCCCTGCAGCTGCCCCCCTCCCTGGCTTCAGTGCAGTGGTGGTCAGC 17 3 8 

AAPLPGFSAVVVS 

TCAGTGCCCCTGGCGGGTGGCCTGTCCAGCTCAGCATCCTTGGAAGTGGCCACGTACACC 17 9 6 
SVPLGGGLSSSASLEVATYT 

TTCCTCCAGCAGCTCTCTCCAGGTACCAGCTAGGCCCCAGCCCTGACCCAGCCCTCCTTC 185 8 
F L Q Q L C P 

CCTGAGGTCTCCAGGTGGTCCCAGCTTCTACTATGCCTTATGGAGGGGGTGGCAGGGAAT 19 18 
CTCCCTGGAGTGTCATTGAAGCCACTGCTGCTTCCACCAGCCCTAGCCTCCCCACCTCAC 197 8 
CCTGTACTGCAGACTCGGGCACAATAGCTGCCCGCGCCCAGGTGTGTCAGCAGGCCGAGC 2 03 8 
DSGT IAARAQVCQQAE 

ACAGCTTCGCAGGGATGCCCTGTGGCATCATGGACCAGTTCATCTCACTTATGGGACAGA 2 098 
HSFAGMPCG IMDQFI SLMGQ 

AAGGCCACGCGCTGCTCATTGACTGCAGGTTGGGCTCGCTCCCCI'CGTCC r 'C TCCC GCCC 2 15 8 
KGHALLIDCR 
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FIGURE 2(b) 



I'GCACTCAGCAGCTO "l'< ;f :f H'GG ACTGTGCCCACTGCCTGGCGCAGCAAGCACACGCTTG 2216 
GCCTCGTCATCTCCn < "A 'PIT ITAArTfX ACCCCACCTCCTTGGAGACCAGCCTGGTGCCA 2 27 8 

SLETSLVP 

CTCTCGGACCCCAACCTGGCCGTGCTCATCACCAACTCTAATGTCCGCCACTCCCTGGCC 2 3 3 8 
LSDPKLAVLITNSNVRHSLA 

TCCAGCGAGTACCCTGTCCGGCGGCGCCAATGTGAAGAAGTGGCCCGGGCGCTGGGCAAG 2 398 
SSEY PVRRRQCEEVARALCK 

GAAAGCCTCCGGGAGGTACAACTGGAAGAGCTAGAGGGTCAGAACTGCCAGGGTGCTCTA 2 4 58 
ESLREVQLEELE 

TCCTCG ACGCGGCTGTt it. H VCCTGCTCGCCCCTCAGTGTCGCCTTGACCCTGCCTCGGAC 2 518 
CCCGATCTCCAGGGGCTTCTGCCATGCTCTCCCCAGTCCCTTCAAACACTGCGCACCCAG 2 57 8 
GGTTCCAATCTCAGCAGGGGTGCTTGAAATCCTAAAATGGTCTTATCTAATCAGAAAAAT 2 63 8 
CATGTTTCCATTGTGGA/\ A7\TGTAGAAAAGTACAAAGTAGAAAATAJ\TAAGCTATAAGGG 2 699 
CACTACCCAGAGATAGGCACTGCTGACATTTTCACGTTTCCTTTCAGTATTTTTCCACAT 2 7 5 8 
CTGTCTTCAAAGCTGAGTATATGTAATATATCATCACTTTCCCCCCCCACCCCCTTTTTT 2 818 
TTAAGAGGCAGGGTCTCATTCTGTTGCCCAAGCTGGAGTGTAGTGGTGTGATCATAGCTT 287 8 
ACTGCAAACTTGAACTCTTGAGCTCAAGGGATCCTCCCAGCTCAGCCTTCCAAGTAGCTG 2 938 
AGATTACAGGTGTGCCACCATGCCCGGCTAATTTTTATCTTCGTAAAGACGGCCTTGTAG 2 998 
TGTTGCCCAGGATG ATCCTG AACTCTGGCCTCAAGAGGTCCTCCTGCCTTGGGCTCCCAA 3 0 5 8 
AGTGTTGGGATTATAGGCATGAGCCACTGCGGCCAGCCCATTTGCCGTGTrTTTTTTTTG 3 118 
GACACAGAGTTTCGGI-CT'I'^TCACCCATGCTGGAGTGCAATGGTGCGATCTCAGCTCACT 3 178 
nTAACCTCTGCCTCrcc;GG'r"ir AAGTGATTCTCCTGCCTCAGCCTCCCGAGTAGCTGGGA 3 2 3 8 
CTACAGGCGCCCGCrAC'I'ACGCXTGGCACATTTTlTATAGTrCTAGTAGAGACTGGGGTT 3 2 9 8 
TCACCATGTTGGCCAGGCTt^GTCTCAAACGCCTGACCTCAGGTGATCCTCCCGCCTCAGC 3 3 58 
CTTCCAAAGTGCTGGGATTACAGGCGTGAGCCATAGTGCCGGTCTCTTTTTTTTTTTTTT 3 4 18 
TTAAACTAAACATAATCTCAGAACCCAGAACCCTATCTTATCTTATGCCATGAAAGGCAT 3 4 7 8 
ATCTCGGCGTGGCTCT'rT'rTTTTTTTTTTTCTTTTTTTTTGGGCGAGGTGGAGGCTTGCC ^53 8 
CTGTTGCCCAGGCTGG AGTGCAGCGGCGCAATCTCGGTTCACTGCATCCTCCACCTCCTG 3 598 
GGTCCAAATG ATCCTCCTCCCTTACCTTCCTG AGTAGGTGGGATTACTGG AACCCACCAC 3 6 58 
CACGCCCAGCCAATTTTTATATTTTTAGTAGAGACGGGGTTTCATGTTGGCCAGGCTGGC 3 7 18 
CTCG AACTCCTG AGCTCCTG ATCTGCCCGCCTCAGCCTCCCAATGTGCTAGG ATTACATG 3 7 7 8 
TGTGAGCCAGTGCACCTGGCCTCCGTGTGGCTCTTTAAAGCTCCACAATATTTTAGCATT 3 8 38 
CAGGTGCTCTGTCA1TTACTTAJ\CTATTTTCTGATACACCTCACACTGCGATTAACTTTC 3 89 8 
CTTATTTATCTTTTTTATTATTTATTTATTTATTTATTTGAGACAGAGTCTTGCTCTGTC 3 95 8 
ACCCAGGCTGGAGTGCAGTGGCACGATCTCGGCTCACTGCAACGTCTGCCTCCCAGGTTC 4 018 
AAGTGATTCTCCTGCCTCAGCCTCCTGAGTAGCTAGGATTAGAGGCATGTGCCACCACAC 4 07 8 
CTGGCTAATCTTCGTATTTTTAGCAGAGATGAGGTTTTACCATGTTGGTCGGGCTGGTCG 4 1 3 e 
TGAACTCCTGACCTGGTGATCTGCCCACCTCAGCCTCCCAAAGTACTGGGATGACAGGCA 4 19 8 
TGAACCACTGTGCCTGGCCATCTTTTTTATTTTTTAAAGAGATGGGT^CTGCTAAGTTGC 4 2 5 8 
CCAGGCTGGACCTGAACTCTTGGGCTCAAGTAATCTTCTCACCTAGTCTCCTGGGTAGCT 4 318 
GCAACCAAAGGCACCCGGTTTATCTGCATTCTCTTTTTTTTCTTTGAGACTGAGT^ 4 3 7 8 

TCTGTAGCCCAGGCTGGAGCGCAGTGGCGTGATCTCGGCTCACTGCAACCTCCGTCTTCA 4 4 3 8 
GGGTTCAAGCAATTCTCCTGCCTCAGCCTCTGGAGTGGCTGGGACTACAGGCGTGTGCCA 4 4 98 
CCAGAGCG AGTTAATTTTTTTTTTTTTTTGTATTTTTAGTGG ACACTGGGTTTCACTATA 4 5 58 
TTGGCCAGGCTGGTCTTGGACTCCTGACCTCAAGTGATCCGCCTGCCTTGGCCTCCCAAA 4 618 
GTGCTGGGATTACAGGCACAGGCGTGAGCCACTACACCTGGCCTATCTGCATTCTCTTAA 4 6 7 8 
TAGTTTCTTAGAAATGGATTCTTAGGACTAGGATTACAGAGTCAAGAGACACAAGTTTTG 4 7 3 8 
TAGGCTGGGTGCGGTGGCTCACCTCTGTGCCTGTAATCCCAGTACTTTAGGAGGCCAAGG 4 7 9 8 
TGGGCAGATTCATTGAGCTCAGGAATTCGAGACCAGCCTGGGCAACATGGCAAAACCCCA 4858 
TCTCTAAAGAAATACAAAAATTAGCCAGGTGTGGTGGTGTGTGCCTGTAGTCCTAGCTAC 4 9 18 
TTAGGAGGCTGGGGTGGGAGGATCAATTGAGCCCAGGAGGTTGAGACTGCAGTGAGCTGT 4 97 8 
GATTGCACCATGGCACTC-CAGCCTGGGCCTCAAAGTGAGATCCTGTCTCCAAAACAAAAA 50 3 8 
AGATACAAGTATCCrrAv\GGCTCCTGCTACACATGGCCAGGAAGGTAGTCTATTGGACAG 5 098 
TTTTAAGGTCATTATCAATA7TAGCTCATTTAATTCCCTCCAAAACTCTGTAAAGCACAT 5158 
TCTGCTACCATAGTTGTL-ATATTTTTGATGGGGGAATCTACAGTGAGAGGCAGTGCTGGG 5 218 
ATCTGAACCCCATCTr;i ;A( At JATTACCTCCACGCCCtATGLTC'ITCACTCGCTCCCCCCC 5 27 8 
CTGCCCACACTG AGTT. :T1XT;TTCCTGGCAGGGTAGCTGTGGCTATCTCAGGGACACTAG 5 3 3 8 
ACAGCTCCGAGGGACC-TCCCTGirCTTTTCCTTTGTGAACTGTGTCACGTTCTCCAGAGC 5 3 9 8 
AGGGCTCAGACCTGCC'C.'TGCCTGCTCTGTGCAGATGCCCTTGGCCAAGGTTTTCACACTG 5 4 58 
GAACAAGTTCGTCCCTCrTCCCCACCCCAGCCTGTCCTTGGCCCTCCTCCAGGTCTCCTT 5 518 
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FIGURE 2(c) 

CTC( ATA i ;a. ;t 'AC- Tf AC.'CCTGCCTCCTCCAnA^TCCTGCCCTAGAAG^GCAAT^c 'CTC 5 5 7 a 
TCOI-!\:CA7V<TT-Ti:. 'TGCCTGCCTGCCTCCITCCCTCAGCCTCCAACACATCfTr-AGT 5*39 
TTTGTTCCCrCCTAAAACACCACCCACTCTCTCAT^ 

TTn^TTrrrr-I^lM-^;AGAGGGACCCTGA ( ;TCTGTCACCCAGGC;TGAAGTGCAG'rGGCA 57^ 
TCATCTCCACTCACTCr ■A^C^CCGCCTCCCAGGTTCAAGCAATTCl'CCTCCCTCAGCCT 58 ~ 3 
CC'IGAGTAGC'TGCt i AT' I'ACAGGCGCCTG GC A' 'GATGCCCGGCTAACTTTTGTATTTTTAG 58^3 
TAGACACCCGGTTTi < it CATCTTCCCCAGGCTGGTCTCGACCTCCTGACCTCACGCAATC 5 9 3 P 
TGCC TCC' .'TCAGL'I" !'(."( CAAAGTGCTGGGATTACAGGTGTGAGCCACCGCGCCCAC'CCAT 5998 

AACrTTCCCCCTl-AAACCTGCGGGTTTGGCCGGCCGCACTACACTGAGTCAGTACTCGTA 6 1 18 
^TGArCCAGGTAC-CCrrcCAGCCTCAGCTCCAGTCAGATGGGACAGCCTCCTGCTCCCTG 6 ' 8 
GCTGCrrCTGCCCL-i:TCTTCTGGAGCCCCAGr;CCTGGAGGCTCCATGTGGrrCAGCAG AA 6 2 3 8 
CTTCT^'CTCG'TC'CTfiCTCTGTGGTGGCCTC'rTGAGGGCAGCACTCAGCTTGGAAAGCATG 6298 
GAGTGTT1X.-AACC-CTCACTCCTCCCTGAAGGACCAAGGTGTCCCATTTTACAGTCGGGGG 6 3 58 
AoGAGGCACTGTGATAAAGGGGCTCTTCAGACCCACGTCTGAGAGAGCCAGGCTGCCCCG 64 18 
CCCCCGCCGCCTirCACCCTTCACCGTCCAGCCAGGGCGACTGCCATCACGGCCTGGTGG 6 4 7 6 
TCC . CA CA GG CG TCGG GG CCCCAG GCAGTG AG AAGGCGGCTGCTG A CTCCTC TTTCCTCC 6 5 3 S 
CCACCTCCCAGGGACCTGGTGACCAAAGAGGGCTTCCCGCGGGCCCCGCACGTGGTCGGG 6598 
AARDLV SKEGFRRARHVV G 



GAGATTCGGCGCACGGCCCAGGCAGCGGCCGCCCTGAGACGTGGCGACTACAGAGCCTTT 
EIRRT AQAAAALRRGDYRAF 



6658 



C5GCCGCCTCATGGTCGAGAGCCACCGCTCACTCAGGTGAGCCCCTCTGGGCGCCCCGCTC 67 19 
GRLMVESHRSLR 

CTGCCGGGCACAGGCCGGCCCAGCCCCACCCCTTCAATATCCTCTCTGCAGAGACGACTA 677 8 

d d y 

TGAGGTGAGCTGCCCAGAGCTGGACC AGCTGGTGGAGGCTGCGCTTGCTGTCCCTGGCCT 6 8 3 8 
EVSCPE LDQLVEAALAVPGV 

TTATGGCAGCCGCATGACGGGCGGTGGCTTCGGTGGCTGCACGGTGACACTGCTGGAGGC 6898 
YGSRMTGGGFGGCTVTLLEA 

CTCCGCTGCTCCCC ACGCCATGCGGCACATCCACGTGGGCGGGCACCAGGGCCTGGCCGG 6 9 5 8 
SAAPHAMRHIQ 

GC AGGAGCGGCAGCTTCCCGGGGCCCTGCCACTCACCCCCAGCCCGCCTCTTACAGGAGC 7 018 

E 

ACTACGGCGGGACTGCCACCTTCTACCTCTCTCAAGCAGCCGATGGAGCCAAGGTGCTGT 7 0 7 8 
HYGGTATFYLSQAADGAKVL 

GCTTGTGAGGCA CC CCCAGGACAGCACACGGTGAGGGTGCGGGGCCTGCAGGCCAGTCCC 7 13 8 
C L * 

ACGGCTCTGTGCCCGGTGCCATCTTCCATATCCGGGTGCTC&ATAiACTTGTGCCTCCAA 7 19 8 

TGTGGTAGCTGCCTCCTCTAGAGGTGGGTGTATGCTTGGGTGTCAGAGAATGGGGGATGT 7 2 5 8 

CAGAACCGCTCCCCTACCCTAGGGGAGCACCTCTCAGGCCCCAGAAGAATGGGCAAGGCA 7 318 

GGGCCTAGCAGTAGCA7VAACCATTTATTAAGTGCAGAACAAAGGCTGGGTCCTTGTGCTG 7 37 8 

CTCCCAGCTrTTTGGTTACAAATAGGTTTGGGCCCACAGAGG ACGG ACCTTGCCCCCTTG 7 4 3 8 

ATCCCTCCCAGCAG ACACCTAGCCCCTGCTCTGTGCATGCGGGTGGGCTGGGCCCGCAGG 7 4 9 8 

GGTGCAAGGATGGAGTACCTGAGG AGGCTCCCCG AG AGGAGTCGGGAGGACCCCTAGTGG 7 55 8 

GACATTGCGGGGGTnoCGCAGGGTGCGGTCAAGTTrGG AAGAAACTGTTGGGTCCA 7 614 

3 ■ 
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